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PREFACE 


The object of this work is to provide a mathematical text on the 
Theory of Statistics, adapted to the needs of the student with an 
average mathematical equipment, including an ordinary knowledge 
of the Integral Calculus. The subject treated in the following 
pages is best described not as Statistical Methods but as Statistical 
Mathematics, or ike maiheniatical foundations of the interpretation 
of statistical data. The writer’s aim is to explain the underlying 
principles, and to prove the formulae and the validity of the 
methods which are the common tools of statisticians. Numerous 
examples are given to illustrate the use of these formulae; but, in 
nearly all cases, heavy arithmetic is purposely avoided in the desire 
to focus the attention on the principles and proofs, rather than on 
the details of numerical calculation. 

The treatment is based on a course of about sixty lectures on 
Statistical Mathematics, which the author has given annually in 
the University of Western Australia for several years. This course 
was undertaken at the request of the heads of some of the science 
departments, who desired for theii* students a more mathematical 
treatment of the subject than those usually provided by courses 
on Statistical Methods. The class has included graduates and 
undergraduates whose researches and studies were in Agriculture, 
Biology, Economics, Psychology, Physics and Chemistry. On 
account of such a diversity of interest the lectures were designed 
to provide a mathematical basis, suitable for work in any of the 
above subjects. No technical knowledge of any particular subject 
was assumed. 

The first five chapters deal with the properties of distributions 
in general, and of some standard distributions in particular. It is 
desirable that the student become familiar with these, before 
being confironted with the theory of sampling, in which he is 
required to consider two or more distributions simultaneously. 
However, the reader who wishes to make an earlier start with 
sampling theory may, take Chapters vi and vn (as far as §60) 
immediately after Chapter m, since these are independent of the 
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theory of Corihlation. The theory of Partial and Multiple Correla- 
tions has been left for the final chapter, in order not to delay the 
study of sampling theory and tests of significance. But those who 
wish to study this subject earlier may read this chapter im- 
mediately after Chapter v, or even after Chapter iv. This order, 
however, is not recommended for the beginn^. 

A feature of the book is the use of the properties of Beta and 
Gamma variates in proving the sampling distributions of the 
statistics, which are the basis of the common tests of significance. 
Consequently a special chapter (vm) is devoted to the properties 
of these variates and their distributions. The treatment is simple, 
emd does not assume any previous acquaintance with the Beta and 
Gamma functions. The author believes that the use of these 
variates brings both simplicity and cohesion to the theory. The 
student is strongly imged to master the theorems of Chapter vm 
before proceeding to tests of significance. In the preparation of 
this chapter, and the following one, much help was derived from 
the study of a recent paper by D. T. Sawkins.* 

The considerations, which determined the presentation of the 
subject of Probability in Chapter n, are the mathematical attain- 
ments of the students for whom the book is intended, and their 
requirements in studying the remaining chapters. The approach 
decided on is substantially that of the classical theory. After the 
proof of Bernoulli’s theorem, the relation between the a priori 
definition of probability and the statistical (or empirical) definition 
is considered; and the measure of probability by a relative fre- 
quency in each case is emphasized. If the book had been intended 
primarily for mathematical specialists, a different presentation of 
the theory would have been given. But it is futile to expect the 
average research worker to appreciate an exposition like Kolmo- 
gorofTsf or Cramer’s,} based on the theory of completely additive 

* ‘Elemeataiy presentation of the firequenoy distributions of certain 
statistical populations associated with the normal population.’ Joum, and 
Proe. Roy. Soe. NJ3.W. vol. 74, pp. 209-39. By D. T. Sawkins, Reader in 
Statistics at Sydn^ Univwsity. 

t Grundbegriffe der WahracheihlieMceitarechnung, Berlin, 1933. 

X Random VariaJbUa and Probability Diatributiona, University Press, 
Cambridge, 1937. 
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Preface 

set functions. The elementary properties of the moment generating 
function and the cumulative function are also given in Chapter n, 
and are used throughout the book. These functions are introduced, 
not as essential concepts, but as useful instruments which lead to 
simpler proofs of various theorems. What has been written about 
them should convince the reader that they deserve his careful 
attention. 

I wish to express my appreciation of the care bestowed on this 
book by the staff of the Cambridge University Press, and my 
pleasure in the excellence of the printing. My thanks are due to 
Professor R. A. Fisher and Messrs Oliver and Boyd for permission 
to print Tables 3, 4, 5 and 7, which are drawn from fuller tables 
in Fisher’s Statistical Methods for Research Workers. I am also 
indebted to Professor G. W. Snedecor and the Iowa Collegiate 
Press for permission to reproduce Table 6, which is extracted from 
a more complete table in Snedecor’s Statistical Methods. Lastly 
I wish to thank Mr D. T. Sawkins for help received in corre- 
spondence concerning statistical theory, and Mr Frank Gamblen 
for assistance in reading the final proof. 

C.E.W. 


PBBTH, W.A. 
April j 1046 


NOTE ON THE SECOND EDITION 

The call for reprinting has given an opportunity to correct a 
number of small errors and misprints throughout the book, and 
to add a new reference here and there. Paragraph 91 on page 195 
is new to this edition. 


1949 


C.E.W. 




CHAPTER I 


FREQUENCY DISTRIBUTIONS 

1. Arithmetic mean. Partition values 

Consider a group of N persons in receipt of wages. Let x shillings 
be the wage of an individual on some specified day. Then, in general, 
X will be a variable whose value changes with the individual. 
Possibly the N values of x will not all be different. Suppose there are 
only n different values Xj, ..., x„ which occur respectively 

fitff times. The numbers/^, in which the subscript t takes the 

positive integral values 1, 2, ..., n, are called the frequencies of the 
values Xf of the variable x; and the assemblage of values x^, with their 
associated frequencies, is the frequency distribution of wages for that 
group of persons on the day specified. The sum of the frequencies is 
clearly equal to the number of persons in the group, so that 

^=/i+/2+...+/»* iu (1) 

fi 

where, as the equation indicates, S ft denotes the sum of the n 

i- 1 

frequencies /{, i taking the integral values 1 to n. When the range of 
values of i is understood, the sum virill be denoted simply by or 

i 

S/<. The number N is the total frequency. The mean of the distribu- 
tion is the arithmetic mean x of the N values of the variable, and is 
therefore given by 

® = ;^(/i®i+/a*2+ - +/»®n) = (2) 

since the value x^ occurs ft times. This formula expresses what is 
meant by saying that x is the toeightid mean of the different values X|, 
whose weights are their frequencies 
Frequency distributions of many different variables will occur in 
the following pages. Thus the values of the variable x may be the 
heights, or the weights, or the ages of a group of persons, or the 
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yields of gnin par acre firom a number of plots of land. For each 
finite distribution /| will denote the firequenoy of the value The 
total frequency N is then given by (1), and the mean 2 of the 
distribution by (2). 

Example, The student to whom the above summation notation is new, 
may profitably verify the following relations. If a is a constant, 

JedfiXi = aJjfiXit 
i i 

S/<(»i+o) = hfiXi+Nap 
i i 

i i i 

If (Xi, ffi) is a pair of corresponding values of two variables, x and y, with 
frequency /*, 

i i i 

The symbol used as a subscript in connection with summation is immaterial; 
but i, y, r, $9 1 are perhaps most commonly employed. 

Suppose that the frequency distribution of x consists of k partial 
or component distributions, being the mean of the^'th component 
and its total frequency, so that 

k 

JV = S 

f-i 

Then that part of the sum which belongs to the jth com- 

i 

ponent has the value n^Xp and the relation (2) is equivalent to 

1 * 

* = 7? .2 (3) 

Consequently the mean of the whole distribution is the weighted 
mean of the means of its components, the weights being the total 
frequencies in those componenta. 

Again, let u and v be two variables with frequency distributions 
in which a value of v corresponds to each value of u. Then the values 
of the variables occur in pairs. Let N be the number of pairs of 
values (t((, «(), and let 


Xi » % + 
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The arithmetio mean of the N values of a; is equal to that of the N 
values of the second member, which iau+v. Consequently 

X = u+v, (4) 

which expresses that the mean of the sum of two variables is equal 
to the sum of their means; and the result can be extended to the sum 
of any number of variables. The reader can prove similarly that, if 
a and b are constants, and . 

X = au+bv, 

then x = au+bv. (4') 

Suppose the N values of the variable in the distribution to be 
arranged in ascending order of magnitude. Then the median is the 
middle value, if is odd; while, if iV is even, it is the arithmetio 
mean of the middle pair, or, more generally, it may be regarded as 
any value in the interval between these middle values. Similarly, 
the quartiles, Qi, Q^, Q^, are those values in the range of the variable 
which divide the frequency into four equal parts, the second quartile 
being identical with the median; and the difference between the 
upper and lower quartiles, Qz—Qi, is the interquartile range. The 
deciles and percentiles are those values which divide the total fre- 
quency into ten and one hundred equal parts respectively. The 
median, quartiles, deciles and percentiles are often spoken of 
collectively as partition values, since each set of values divides the 
frequency into a number of equal parts. Sometimes they are 
referred to as quantiles. 

That value of the variable whose frequency is a maximum is 
called a mode, or modal value, of the distribution. When, as usually 
happens, there is only one mode, the distribution is said to be 
unimodal. 

We shall presently consider continuous frequency distributions. 
But it should be pointed out at once that the variable may be eitiber 
continuous or discrete. A continuous variable is one which is capable 
of taking any value between certain limits; for example, the stature 
of an adult man. A discrete variable is one which can talto only 
certain specified values, usually positive integers; for example, the 
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number of heads in a throw of ten coins, or the number of accidents 
sustained by a worker exposed to a given risk for a given time. Of 
course an observed frequency distribution can only contain a finite 
number of values of the variable, and in this sense all observed 
frequency distributions are discrete. Nevertheless, the distinction 
between continuous and discrete variables will be found to be of 
importance when we come to study populations and probability 
distributions. In the next few sections we shall assume that the 
values of the variables are discrete. 

2. Change of origin and unit 

The following graphical representation will be found helpful. 
Taking the usual x-axis with origin 0, we may represent the variable 
X by the abscissa of the current point P. Then x is the abscissa of a 

0 A O P X 

1 1 1 1 . I . . i 

Fia. 1 

fixed point O. It is frequently convenient to take a new origin at 
some point A , whose abscissa is a. Let ^ be the abscissa of P relative 
to as origin. Then, since OP = OA +AP, we have 

x**a+S’ 

Thus S is the excess of x above a, or the deviation of x from that 
value. Taking the mean of each member of this equation we have 

S-o+5. (6) 

Thus S— a, which is the deviation of the mean value of x from a, is 
equal to %, which is the mean of the deviations of the values x^ from 
a. In particular, by taking a as «, we have the result that the sum of 
the deviations of the values x^ from their mean is zero. This is also 
easily proved directly; for 

2/,(x,-x)-2M--Z^2-0 (6) 

i t 

in virtue of (2). 
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In addition to choosing A as origin it may be convenient to use 
a different unit, say c times the original unit. Then, if « is the 
deviation of P from A measured in terms of this new unit, 

« ■ {x—a)le 

or * = o+c«, (7) 

Taking the mean value of the variable represented by either side 
we have, in virtue of (4'), 

S = a+c«, (8) 

tt being the mean value of u for the distribution. 

Example. Eight coins were tossed together, and the number x of heads 
resulting was observed. The operation was performed 256 times; and the 
frequencies that were obtained for the different values of x are shown in the 
following table. Calculate the mean, the median and the quartiles of the 
distribution of x. 


X 

/ 

£ 

/£ 

/£* 

/£* 

0 

1 

-4 

- 4 

16 

- 64 

1 

9 

-3 

-27 

81 

-243 

2 

26 

-2 

-62 

104 

-208 

3 

69 

-1 

-69 

69 

- 69 

4 

72 


0 

0 

0 

5 

62 

1 

62 

52 

62 

6 

29 

2 

68 

116 

232 

7 

7 

8 

21 

63 

189 

8 

1 

4 

4 

16 

64 

Totals 

256 


! - 7 

607 

- 37 


The different values of x are shown in the first column, and their frequencies 
in the second. The calculation is simplified by taking the value a; = 4 as 
origin of £. The values of £ corresponding to those of x are given in the third 
column, and those of the product/^ in the fourth. The remaining columns are 
not needed for the present. Totals for the various columns are given in the 
bottom row. Hence 

S = = -7/256 = -0-027. 

The mean value of a: is therefore 

S = a + | = 4-0027 = 3-973. 

The mode, being the value of x with the largest frequency, is clearly 4. To 
find the median we observe that the values of x are arranged in ascending 
order, and that the 128th and 129th are both 4. Hence the median is also 4. 
Similarly the 64th and 65th values aj;e both 3, so that the lower quartile is 3. 
In the same way we find that the upper quartile is 6. 
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3. Varlanoe. Standard deviation 
The mean square deviation of the variable x firom the value a is, 
as the name implies, the mean value of the square of the deviation 

of X from a. It is therefore given by The positive square 

root of this quantity is the root-mean-sqmre deviation from a. In 
the important case in which the deviation is taken from tiie mean of 
the distribution, the mean square deviation is called the variance 
of X, and is denoted by The reason for the notation will appear in 
the next section. The positive square root of , the variance is called 
the standard deviation (s.d.) of x, and is denoted by <r. Thus 

= = (9) 

The variance (or the s.n.) may be taken as an indication of the 
extent to which the values of x are scattered. This scattering is 
called dispersion. When the values of x cluster closely round the 
mean, the diq)ersion is small. When those values, whose deviations 
from the mean are large, have also relatively large frequencies, the 
dispersion is large. The concepts of variance and s.n. will play a 
prominent part in the foUoAving pages. 

When the mean square deviation from any value a is known, and 
also the deviation | of the mean from that value, the variance is 
easily calculated. For 

<r' - 5E/,(&-5)’ - S/,{.+P 

-jS/il?-?. (10) 

This formula is of great importance, and will be constantly employed. 
On multiplying by N we have an equivalent relation, which may be 
expressed 

= ( 11 ) 

Thus No* is less than showing that the sum of the squares of 

i 

the deviations of the values Xf is least when the deviations are measured 
from the mean. 
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Another possible measoie of dispersion is the mean value of the 
absolute deviation from the mean of the distribution, commonly 
called the mean deviation from the mean. This quantity, however, 
does not lend itself readily to algebraical treatment, and is therefore 
not nearly so important as the variance and the s.n. The semi- 
interquartile range is also sometimes taken as an indication of tiie 
magnitude of the dispersion. 

The significance of the magnitude of the standard deviation 
clearly depends upon the values of the variable. Thus a s.n. of 6 in. 
in the measurements of the height of a tower, is much less significant 
than an equal 8.i>. in the measurements of the height of a man. The 
ratio of the s.i>. to the mean value of the variable is called the 
coefficient of variation. It is an absolute measure of dispersion in tiie 
sense that it is independent of the unit employed. And by means of 
this coefficient we are able to compare the variabilities of distribu- 
tions of different characteristics, such as weight and height. Some- 
times the coefficient of variation is defined as 100 times the above 
value, i.e. as the percentage of the mean which is equal to the s.i>. 

Example 1. In the example of the preceding section the mean square 
deviation of x from the value 4 is 607/266 = 1'98. Hence the variance is 
given by 

<r* = 1-98-p = l-98-(0-027)* =: 1-08, 
and (T s= 1'407 ss 1*41 nearly. 

From the/f column it is dear that the sum of the absolute deviations fiom 
n s 4 is 277. By measuring deviations from the mean, instead of firom z a 4, 
we increase the absolute deviations of 161 values, and decrease those of 96 
values, by 0*027. Hence the siun of the absolute deviations from the mean is 
277 + 66(0*027) = 278*78. The mean deviation is therefore 1*09 approxi- 
mately. 

Example 2. Find the mean and the variance for the distribution in which 

the values of x are the positive integers 1, 2, 3 N, the frequency of each 

being unity. 

„ _ N{N+1) „ 

Here » = — ^ + 1). 

The mean square deviation firam n a 0 is 

(l*+a^+...+N*)/N^i(N+l)(2N+l). 

«r* a i(JV + 1) (21V + 1 ) - + 1 )» 

'=*(2V»-1). 


Henoe 
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ExampU 3. For the distribution expressed by 

xm 5 6 7 8 0 10 11 12 13 14 15 

/sl8 25 34 47 68 60 80 62 38 27 11 

the total frequency is 600. Show that the mean value of x is 10-054, the vari- 
ance 5- 58, the s.D. 2*36, the median iO, and the lower and upper quartiles.9 
and 12 respectively. Also calculate the mean deviation from the mean as 
in Ex. 1. 

Example 4. A distribution consists of several component distributions. 
Express the variance of the whole distribution in terms of those of the 
components and the deviations of the means of the components from the 
general mean. 

Let n/ be the frequency in thejth component, or^ its s.D., and x the 

deviation of its mean from the general mean. Then the mean square deviation 
of this component from the general mean is + and the sum of the 
squares of its deviations from the general mean is nf{a^+c^). Hence the 
variance or* of the whole distribution is given by 

Nor* = Sn^(of +c9), 

where N is the total frequency S n^. 

i 

4. Moments 

In the notation of the preceding sections the mean value of the 
rth power of the deviation of the variable from the value a is 
This is usuaUy called the rth momerU of the distribution 

i 

about the value a, or the moment of order r. The term ‘moment* is 
borrowed from Mechanics. Since fJN is the relative freqmncy of the 
value in the distribution, and the deviation ^ from a is represented 
by the distance AP, the above expression can be regarded as the 
sum of the rth moments of the relative frequencies about A. The 
rth moment about the mean of the distribution is denoted by 
The corresponding moment about a specified value other than the 
mean, will be denoted* by /e'. Thus 

= ( 12 ) 

is the rth moment about the mean, while 

K - = is/iSJ (13) 

* An alternative notation, with v, instead of /(', has some advantagea. 
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is the rth moment about the value a. Putting r i* 0 we see that 

/»o “ - 1. (14) 

Similarly, in virtue of (2) and (6), we have 

/»! = I. /*i “ 0. (16) 

The second moment about the mean is clearly the variance already 
discussed. 

By means of the binomial expansion, moments about the mean 
of the distribution may be expressed in terms of moments about 
any other value, x a a. Thus 

Q U-1+ Q .... ( 16 ) 

denoting the binomial coefficient, often written *‘C, or C\. In 
particular, in virtue of (14) and (15), 

= = (17) 

in agreement with (10). Similarly 

/e, = /t3-3|^g+2|», 

65V*a “ 

and so on. 

In calculating moments it is frequently convenient to change the 
unit. As in § 2, let u be the measure of the deviation from x e= a, in 
terms of a unit e times the original unit, so that ^=>cu. Then the 
rth moment of x about a is 

= = (19) 

Thus the rth moment of the variable x is c** times the correi^onding 
moment of the variable u. 

A distribution is said to be aymmetncal when the frequencies are 
qnnmetzically distributed about the mean, that is to say, when 
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valueB eqnidifttant firom the mean have equal frequenoiea. For 
example, the distribution expressed by 

x>0 12346678 

/ o, 1 8 28 66 70 66 28 8 1 

is qrmmetrioal about its mean x » 4. In the case of a symmetrical 
distribution there is the simplification that all the moments of odd 
order about the mean are equal to zero, since tiie terms of the sum 
in (12) cancel in pairs. In the case of an unsymmetrical distribution, 
the degree of departure fixtm symmetry is called its slowness. 
More than one measure of this property has been proposed. One of 
the simplest is while another is half this expression. These are 
clearly independent of the unit chosen for the variable, and they 
vanish if the distribution is symmetrical. Another measure of 
skewness, proposed by Karl Pearson, will be given later. 

Example 1 . For the distribution of re in the example of § 2, the third moment 
about £ s= 0 is 

/ijs- 37/266 = -0146. 

Hence the third moment about the mean is given by 

IS -0-146-3(-0-027)(l-98) + 2(-0-027)» 

— O'OIS nearly. 

The skewness, calculated irom the formula Hzlo*, is 

0-018/2-8 = 0 0064, 

which is very small. 

ExemtpU 2. For the distribution in § 3, Ex. 3, show that ^ s — 1'92, and 
deduce that /ttla* = —0*146. 

5. . Grouped distribution 

Frequently the number of different values of the variable repre- 
sented in the distribution is so large that, for convenienco in cal- 
culating the moments, it becomes necessary to approximate by 
grouping the values. In such oases the range of variation of » is 
usually divided into a number of equal intervals. The group of 
values falling in a given interval constitutes a ctasa; and the number 
of such values is the does frequency. The magnitude of an interval 
is called the class interval. For simplicity of calculation the number 
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of intervalB chosen will not be too large, preferably not more than 
20, or at the most 26; and, in order that the results may be suffi- 
ciently accurate, the number must not be too small, preferably not 
less than 12. The approximation in the calculation of moments 
consists in regarding each value as equal to the mid-value of the 
interval in which it falls. We shah later meet formulae,* known as 
Sheppard's adjustments, giving the corrections that may be applied 
under certain conditions to the approximate values of the moments 
calculated as above. We merely mention here that, under the 
conditions referred to, the correction to the mean may be neglected, 
while the calculated variance should be reduced by c*ll2, c being 
the magnitude of the class interval. The following is an example of 
the treatment of a distribution by grouping. 

Example. Over a period of years, 670 students were examined in Mathe- 
matics I at the annual examinations of the University of Western Australia. 
The marks gained by students ranged £rom 0 to 90, all being integers. These 

Percmtagea in Mathematiee 


Interval 

Mid- 

value 

■ 

u 

/« 

M 


0 to 4 

2 

12 




-12,000 

5to 9 

7 



-117 


- 9,477 

10 to 14 

12 




832 

- 6,666 

16 to 19 

17 



- 98 

686 

- 4,802 

20 to 24 

22 


- 6 

-138 

828 

- 4,968 

25 to 29 

27 


6 

-116 


- 2,876 

30 to 34 

32 

29 

- 4 

-116 

464 

- 1,866 

35 to 39 

37 

34 

- 3 


306 

- 918 

40 to 44 

42 

44 

- 2 

- 88 

176 

- 352 

46 to 49 


44 

HI 

- 44 

44 

- 44 

60 to 64 


50 

Hi 

-1,042 


-43,948 

66 to 69 


62 

1 

62 

62 

62 

60 to 64 

62 

61 

2 

122 

244 

488 

66 to 69 

67 

41 

3 

123 

369 

1,107 

70 to 74 

72 

32 

4 

128 

612 

2,048 

76 to 79 


27 

5 

136 

676 

3,376 

80 to 84 


23 

6 

138 

828 

4,968 

86 to 89 

87 

17 


119 

833 

6,831 

90 to 94 

92 

13 


104 

832 

6,666 

96 to 99 

97 

5 


46 

406 

8s646 


■Hill 



066 


28,170 

Totals 

IHi 

670 

HSHI 

- 76 

10,914 

-16,778 


Beef 16. 
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were grouped fri 20 clames, with a class interval of 5, the class frequencies 
being as shown in the accompanying table. The mid -values of the intervals 
are 2. 7, 12, .... 07. The calculation is simplified by taking the class interval 
as a new unit (e ss 6), and the value = 52 as a new origin. The deviations u 
from this origin, measured in class intervals, are shown in the fourth column. 
The above choice of origin ensures that the larger frequencies are multiplied 
by the smaller values of u. In the row corresponding to ti = 0 the entries for 
/u,/u*, etc., are all zero, and need not be recorded. The space may therefore 
be used to record the sums of the negative numbers above this row. The sums 
of the positive numbers are similarly indicated at the bottom; and the total 
for each column is given in the last row* 

The mean value of u is therefore 

B = Tr^/,u, = -76/670 = -2/15 = -0-133, 

N i 

and the moan percentage 

X = a + cu = 52 — 0*667 = 51*333. 

The mean square deviation of u from w = 0 has the value 10,914/570 = 10*16; 
and the variance of u, which may be denoted by cr^, is 

19*16-u»= 19*13. 

Consequently = 4*374, and therefore = 21*87. If, however, Sheppard’s 
adjustment is made, we find 

ai = 19*047, = 4-366, cr, = 21*82. 

The third moment of u about u = 0 has the value — 16,778/670 = — 27*68. 
Thus, in virtue of (18), 

/t, = -27*68 + 7-65- 0-005 = -20-03. 

The skewness of the distribution, calculated from the formula /i^/cr^, is 
— 0-24 nearly. The reader may verify that the mean deviation of x from the 
mean is about 17*7. 

The partition values may be estimated on the assumption that the fre- 
quency in any class is evenly distributed among the values in that class. The 
reader will find in this way that the median is 53, and the lower and upper 
quaHiles 37 and 66 respectively* 

6* Continuous distributions 

The distributions considered, so far are discrete distributions, or 
distributions of discrete values of the variable. A continuous dis- 
tribution is one in which the variable takes every value between 
certain limits, a and 6. The total frequency is therefore infinite; and 
so is the frequency within any finite interval, a to >9, of the range of 
the variable. We shall confine our attention to continuous distribu- 
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tions, which are such that the relative frequency in the infinitesimal 
interval x~^dx to x+\dx is expressible aaf{x)dx, whero/(x) is a 
continuous function of x, called the reUUive frequency iknaity. Then 
the continuous curve 

y -“/(a:) (20) 


is the relative freqtiency curve for the distribution. If this curve is 
symmetrical about some line x == 0 , the distribution is said to be 
symmetrical. The infinitesimal interval x—ldxtox+ ^dx, with mid- 
value X and magnitude dx, may be conveniently referred to as the 
interval dx. The relative frequency /(x) da; in this interval is repre- 
sented by the area under the curve (20), between the ordinates at 
the ends of the interval. Hence the relative frequency for the 

rfi 

interval a to is given by the integral I f{x) dx. The sum of all the 
relative frequencies is unity, so that 



( 21 ) 


As in the case of a discrete distribution, the moment of order r 
about a specified value is the sum of the rth moments of the relative 
frequencies about that value. Hence the mean x, which is the first 
moment about the origin, is given by 

2 = J xf{x)dx. (22) 


The moments of order r, about the origin and the mean, are 


f 3ff{x)dx, /t, => r {x-^xyf{x)dx. 
J a J a 


And, in virtue of the binomial expansion, it follows as in § 4 that 
the relations (16)-(18) hold for a continuous distribution also. In 
particular, the variance of the distribution is given by 

<r*ss/{,aj x*f{x)dx-^. (23) 

Example 1. By way of illustration consider a straight rod of length L 
The distance a? of a moleci^e of the rod from one end may be regarded as a 
oontinuous variable, ranging from 0 to 2; and the distribution of the values 
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ofo^iatheiiooiitintioiui. If Misthetnaasof therodyandtholinoardfiiisityfn 
is oontinuoust the funotion/(x) for the distribiition has the value mIM, 

In the ease of a uniform rod, of length 2a, the distance of a molecule from 
the middle point varies continuously from — o to a, and the distribution of 
a? is continuous with/(a;) = l/2a. The f^quency curve is a straight line paridlel 
to the x-axis, and the distribution of a; is said to be rectangular or uniform. 
The mean value of a; is now zero, and the variance Is given by 


<r* 


r® a^dx 


o* 


3 


All the momenta of odd order about the mean are zero. The moment of order 
2» is easily shown to be a*"/(2n +1). 


Example 2. Draw the frequency curve for the symmetrical distribution 
in which 



the range of the variable being —a to a; and show that/(a;) satisfies the 
condition (21). Also show that the variance is 


a* = a*(^ — jr)/7r = 0-273o*, 
and the s.n. o* = 0-52a, while 


s: o«(l-8/3ff) = 0-161a«. 

It is apparent from the above that a discrete distribution cannot 
be identified with a continuous one. Sometimes, however, a discrete 
distribution 2>i approximates to a certain continuous distribution 
in the sense that, for any interval in the range of the variable, the 
relative ficequenoy of is approximately equal to that of D^. 
Such an approximation requires that the total frequeni^ N of Di 
should be large, and that there should be only very small intervals 
between pairs of adjacent values of the variable. Consider, for 
example, the distribution of ages of all living people, not more than 
(say) 60 years old at a specified instant. The variable x ranges from 
0 to 50 years; and tiie ficequency in any interval, not less than a few 
hours, is so large that, since a period of a few hours is very smaU 
compared with 60 years, the distribution may be regarded as 
continuous for all practical purposes. If the necessary data 
were available, we could calculate a continuous function f{x) to 
give the relative frequency density to any reasonable degree of 
aoeoraoy. 
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The diagram iUoatratee the frequency curve of an unqrmmettioal, 
unimodal, continuous distribution. The mode of the distribution is 
the abscissa of the maximum ordinate NN\ Since the area under 
the curve in any interval represents the relative frequency for that 



« 

interval, it follows that the median is the abscissa of the ordinate 
MM', which bisects the area under the curve. And from (22) we see 
that the mean is the abscissa of the ordinate 00', which passes 
through the centroid of the area under the curve. We may mention 
in passing that, when the skewness is small, the relation 

mean— mode a 3(mean— median) 

holds fairly accurately. The student may regard this as an empirical 
relation, since we shall not give any proof. The definition of skewness 
most frequently used is that of Karl Pearson. It is 

mean— mode 

skewness 

S.D. 

When the mode can be accurately determined this definition is a 
convenient one. The frequency oTirve in the diagram is that of a 
positively skew distribution, the longer tail of the curve being to 
the right. 

The ratio of the fourth moment about the mean of a distribution, 
to the square of the variance, is independent of the unit employed. 
This invariant of the distribution is called its kurtoais, and is fre* 
quently denoted by Thus 

fit * 


( 24 ) 
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In the normal distribution, to be considered later, the kurtosis has 
the value 3. Since this distribution is regarded as the standard or 
ideal, the quantity for any distribution is called its excess of 
kurtosis, or briefly its excess. Corresponding to the above notation, 
fix is used to denote the invariant /t|//t|. This is the square of the 
skewness as deflned in §4. 
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Mills, 1938, 1, chapters iv and v. 

Jones, 1924, 3, chapters i-vn. 

Rider, 1939, 6, chapters i and n. 

Tippett, 1931, 2, ch^ters i and n. 

Ribtz (ed.), 1924, l,%iapter n. 

Goulden, 1939, 2, chapters i and n. 

BowLEY, 1920, 2, part i, chapters v and vi; part n, chapter i. 
Kendall, 1938, 6. 


EXAMPLES I 

1. A distribution consists of three components with frequencies 
of 200, 260 and 300, having means of 26, 10 and 16, and standard 
deviations of 3, 4 and 6 respectively. Show that the mean of the 
combined distribution is 16, and its s.d. 7*2 approximately. 

2. The yields of grain (x Ib.) from 600 small plots are grouped 
in classes with a common class interval (0*2 lb.) in the table below, 
the values of x given being the mid-values of the classes. Show 
that the mean of the distribution is 3*96 lb., its s.n. 0*46 lb., the 
median also 3*96 lb., and the quartiles 3*63 and 4*28 lb. 


X 

/ 

X 

/ 

X 

/ 

X 

/ 

X 

/ 

2-8 

4 

3-4 

47 

40 

88 

4*6 

35 

6*2 

4 

30 

16 

3-6 

63 

4*2 

69 

4*8 

10 



3*2 

20 

3-8 

78 

4*4 

69 

6*0 

8 


— 


(Mercer and Hall, Joum. Agrie. Sei. 1911, vol. 4, p. 107.) 


* The references are to the literature listed at the end of the book. The 
student is strongly advised to read some of the literature mentioned at the 
end of each chapter, preferably in the order given. He will thus acquire a 
better background for the mathematioal theory. 
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3. The wages of 1,000 employees range from 4s. M. to 19s. 6i. 
They are grouped in 16 classes with a common class intenral of Is., 
and the class frequencies, from the lowest class to the highest, are 
6, 17, 35, 48, 65, 00, 131, 173, 165, 117, 76, 62, 21, 9, 6. Tabulate 
the data, and show that the mean wage is 12*006s., the s.i>. 
2*626s.s2s. l \ d ., and the median 12*127s.b l2s. IJd., and the 
mode 12*369s.*b12s. 4\ d . nearly. (Adjusted s.d. 2'61s.) 

4. With the notation of § 4, and by a method similar to that used 
in proving (16), show that 

/*8=/‘8+32/t, + 5», 

Z*! = As + + fiJVi + 1*. 

+ (2) 2*/i,-8+ - + (2) + 

5. The first three moments of a distribution about the value 2 
of the variable are 1, 16 and —40. Show that the mean is 3, the 
variance 15, and — 86. Also show that the first three moments 
about a;=>0 are 3, 24 and 76. 


6. Prove that the mean deviation from the median is less than 
that measured from any other value. (See Aitken, 1939, i, p. 32.) 

7. Show that, if the class interval of a grouped distribution is 
less than one-third of the calculated s.o., Sheppard’s adjustment 
makes a difference of less than i % in the estimate of the s.d. 


8. Show that, if the variable takes the values 0, 1, 2, 3, ..., » 
with frequencies proportional to the binomial coefficients 

(2) ’ (3) ’ ^ *J^® mean of the distribu- 


tion is the second moment about x » 0 is n(»+ 1)/4, and the 
vatriance is \n. 


9. In a continuous distribution, whose relative frequency density 
is given by/(z) » 3x(2— x)/4, the variable ranges from 0 to 2. Show 
that the distribution is symmetrical, with mean 1, and variance 

1/6. Show that the second and third moments about x » 0 are 6/6 
and 8/5 respectively; and verify that » 0. 


WM* 
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10. Exponential dietr^mtion. Consider the contmuons dis- 
tribution in which /(«) a ae~^, a being positive, and the variable 
nuDging from 0 to oo. Show that the mean is 1/a and the variance 
1/a*. .Also prove that the second and third moments about x = 0 
are 2/a* and 6/a* respectively, and that /t, => 2/a*. 

11. Eaetorial moments. The factorial momehts of a distribution 
are define as follows. That of order r about the origin (a; a- 0) is 

where 2)...(a5— r+l). 

Show that factorial moments ore related to ordinary moments by 
the equations 

= fii-3/ii + 2x, 

Ki) = /»4- 6/‘8+ ll^ai- 6S, 
and so on. Similarly 

/*8 =/^+3/^+». 

The expression for tiie factorial moment ft(,) about i^e mean is 
obtained from that defining /i^) by replacing x by the deviation 
x—x firom t^e mean. Relations corresponding to the above are 
obtained by dropping the dashes and putting x — 0. The student 
may then easily verify that factorial moments about the mean are 
connected with those about the origin by the equations 

/«(e = +®. 

^ 3;*^® + 2»* - 25, 

/*(4) “ M'fti ~ + 1) i«(*) — 3® (^ + 2®* — 2 — 2) . 



CHAPTER n 


PROBABILITY AND PROBABILITY 
DISTRIBUTIONS 

7. Ezplanatioti of terms. Measure of probability 

We begin by approaching the subject of probability from the 
point of view of the classical theory. Later in the chapter we shall 
give the statistical or empirical approach, and the relation between 
the two will become apparent. We hope in this way to provide a 
simple introduction to the subject, adapted to .the needs of those 
for whom the book is written. 

The throwing of an ordinary cubical die may result in any one 
of six different oases, in the sense that any one of the six faces may 
be uppermost when the die comes to rest. This group of six cases is 
exhaustive, because it includes all possible oases that may result. 
The different oases are mutually exclusive, since no two faces can 
be uppermost at the same time. The throwing of the die may be 
referred to as a trial. In general a trial is the establishing of certain 
conditions, which must produce one of several results or oases. Two 
such bases are said to be mvivally exdvtMve when the happening of 
one of them precludes the happening of the other; and a group of 
oases is said to be exhaustive when it includes all possible ones. The 
number of (saaea favourable to an event A are those that entail the 
happening of A. For example, in the throwing of the die three of 
the oases are favourable to the appearance of an even number, and 
two are favourable to the appearance of a multiple of 3. 

Suppose that our die is a perfect cube made of homogeneous 
material, and that the marking of the faces has not made it dynamic- 
ally unsymmetrioal. Then there is no reason to expect that, as the 
result of an unbiased throw, any particular face will come uppermost 
rather than any other. We say that the six cases are equally likely, 
or equally probable. Similarly the 62 cases that may result from 
the drawing of a card without discrimination from an ordinary pack, 
are equally likely. And, in general, two events are said to be equaUy 
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UMy if, after all leleTa&t evidence has been taken into account, one 
of them may not be expected rather than the other.* 

The terms probable and probability are used in ordinary language 
in many different connections; and the reader will agree that, in a 
great many instances, it is useless to attempt a numerical estimate 
of the probability under consideration. For instance, we cannot 
give a numerical value to the probability that a certain man’s 
political or religious beliefs are correct, or that a statement by a 
perfect stranger is true. There is, however, a very large group of 
questions in connection with which a numerical estimate of prob- 
ability may be attempted with very useful results. We shall first 
state the method of measurement adopted in the classical theory, 
and then give examples of problems to which the method is applic- 
able. 

Measure of probability .f If a trial may restdt in any one of n 
exhaustive, mutually exclusive and equally likely cases, and m of these 
are favourable to an event A, then the probability {or chance) thal A 
wiU happen as the result of the trial is measured by the quotieni mjn. 

This measure of the probability of the event A is denoted by p. 
Thus 



and p is clearly a positive number, not greater than unity. The 
opposite event is understood as the failure of A to happen; and its 
probability is denoted by q. Since the number of cases involving the 
failure of is »— m, it follows that 

q = {n—m)fn = l—mfn = l—p, 

so that p+q^il. (2) 

This method of measuring probability is confined to problems in 
which the results of a trial are reducible to a certain number of 
equally likely cases. Cionsiderations of symmetry and similarity 
frequently enable us to decide whether, in the problem before us, 
the resulting cases are of this nature; and, only if they are so, is the 
calculation valid. Objection is often raised to the use of the idea of 
* Cf. UqieDsky, 1937, 2, p. S. f Ibid. p. 6. 
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equally probable oases in defining the measure of probabUity.'The 
objection loses some of its force if we remember that, in the corre- 
sponding concept of temperature, there are methods of recognizing 
equality of temperature which are quite independent of the measure- 
ment either of absolute temperature or of difference in temperature. 

When we speak of choosing one at random from a group of n 
objects, we mean that the choice is made in such a way that each 
object has the same chance of being selected. Various methods have 
been devised for making such a choice; and these do not assume any 
similarity on the part of the objects in the group. One such method 
is illustrated by the procedure often adopted in the case of a lottery. 
The objects are numbered 1, 2, .... n, and » similar marbles are 
numbered correspondingly. The marbles are placed in an um, and 
thoroughly mixed; and one of them is then chosen without dis- 
crimination. The number on this marble determines the object 
selected. The method thus uses the similarity of the marbles to 
ensure that the selection is random. Statisticians have devised 
other methods of random selection; but it is unnecessary, for us to 
examine them. 


Example 1. The chance of throwing a 6 with an ordinary die is 1/6. The 
chance of throwing an odd number is 3/6 = 1/2. 

Example 2. In a class of 12 pupils, 6 are boys and the remainder girls. 
The probability that a pupil selected at random will be a girl is 7/12. 
The probability that two pupils selected at random will both be girls is 

) s ^ . The aids against their being both girls are 15:7. 

22 

Example 3. From each of three married couples one of the partners is 
selected at random. What is the probability of their being all of one sex T 
The number of favourable cases is two, viz. all men or all women. Hie 
total number of equally likely cases is 2* = 8. Hence the required probability 
is 1/4. The odds against are therefore 3 : 1. 

The probability of choosing two men and one woman is 3/8. 



8. Theo rems of total and compound probability 
The determination of probabilities by direct enumeration of the 
number of cases is often laborious. The calculation may be simplified 
by using the theorems of addition and multiplication of probabilities, 
which are also known as the theorems of total and compound prob- 
ability. Let us first consider addition. 
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Suppose that' a trial may result in anyone of n equally likely oases, 
of whioh mi are fsrouxable to an event Ai,m^to an event A^,...,m]i 
to an event Aj^, and so on. If the events Ai, A;^ are mutually 
exdnsive, tiie corresponding favourable oases are all different. The 
number of cases favourable to either Ai or A^, ... or is titierefore 
mx+m,+ ... +mj^. Hence the probability thet one of these events 
vdll happen is 

tni+m,+ ...+m^ mi , , m* 

Z — 2j Po 

ti n n ^ {""1 

wbeie Pi is the probability of the event Ai. We may therefore state 
the theorem of addition of probabilities: 

The prcbability that one of several mutvally exclusive events will 
happen is the sum of the probabilities of the separate events. 

In particular, if the k events Ai are exhaustive, so that they include 
all the mutually exclusive cases that may arise, the sum of the k 
numbers is equal to n, and therefore » 1. Thus the sum of the 

probabilities of the exhaustive and mutually exclusive cases that 
may result from the trial is equal to unity. From the above argu- 
ment it is also clear that, if an event may occur in several mutually 
exclusive forms, the probability of the events is the sum of the 
probabilities of its mutually exclusive forms. 

Example 1. What is the probability of obtaining a total of 0 points in a 
sin^e throw with two dice? 

The event may happen in one of the four mutually exclusive forms, 6 and 3, 

5 and 4, 4 and 5, 3 and 6, the chance of each of these being 1/36. Hence the 
required probability is 1/9. 

Show that the probability of a total of 7 points is 1/6, and that of a total 
of 10 points is 1/12. 

Example 2. From a set of 17 cards, numbered 1,2,..., 17, one is drawn at 
random. What is the chance that its number is a multiple of 3 or of 7? 

The chance of its being a multiple of 3 is 6/17, and that of a multiple of 7 
is 2/17. The required probability is therefore 7/17, the two events being 
mutually exclusive. 

Show that the chance that the number will be a multiple of 3 or of 6, or 
of both, is idflo 7/17. 

Consider next a pair of related trials. Suppose, for instance, that 
we have an um containing r red and s white balls , and that we make 
a random drawing of two balls in succession. The first trial is the 
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drawing of the first ball; and, if tiiis proves to be red, we shall say 
that the event A has happened. Similarly, the second trial is the 
drawing of the second ball, without replacing the first; and, if this 
proves to be red, we shall say that the event B has happened. The 
probability of .d is clearly r/(r +s). That of B depmds on whether A 
has happened or not. Thus, if A has happened at the first trial, the 
probability of .B in the second is (r— l)/(r+«— 1); but, if A has not 
happened, it is r/(r+s— 1). The former of these is the cojiditiondl 
probability of B on the assumption that A has happened. Thus the 
two events, A and B, are not independent. Two events are said to 
be independent only when the probability of either of them is not 
affected by the happening or failure of the other. By way of illustra- 
tion consider a combined trial consisting of the throwing of two 
ordinary dice, either together or in succession. If the event A is the 
throwing of a 6 with the first die, and the event B the throwing of 
a 6 with the other, the probability of B is 1/6, whether A has hap- 
pened or not. Events A and B are thus independent. 

More generally, suppose that a trial or a combination of trials 
may result in any one of n equally likely cases, some of which entail 
the happening of A alone, some that of B alone, and others that of 
both A and B. Let tn be the number favourable to the event A. 
The oases favourable to both A and B are all indoded in these m. 
Let be their niunber. Then the probability p that both A and B 
will happen is given by 

m, rnnh 

P=:—izs i. 

n n m 

The first quotient, m/n, is the probability of the event A. Also, 
since m is the number of oases that entail A, and is the number 
of these that entail B also, mjm is the conditional probability of 
B on the assumption that A has happened. Hence we have the 
theorem of compound probability: 

The probability of the combined occurrence of two eventa, A and B, 
is the product of the probabiUty of A by (he eondUioml probabilUy 
of Bon Ac aeaumption (hat A has happened. 

In the case of independence the result may be stated simply, that 
the probability that two independent events will both happen is 
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the produot of the probabilities of the separate events. The theorem 
may clearly be extended to include any number of events. 

Example 3. A coin is tossed three times. Find the chance that head and 
tail will show alternately. 

The required event has two mutually exclusive forms, viz. head-tail-head 
and tail-head -tail. By the above theorem the chance of either of these is 
(^)*; and by the theorem of total probability the required chance is then 
(*)* + (*)* = *. 

Example 4. Three groups of children contain respectively 3 girls and 
1 boy, 2 girls and 2 boys, 1 girl and 3 boys. One child is selected at random 
from each group. Find the chance that the three selected comprise 1 girl 
and 2 boys. 

The event may happen in any of the mutually exclusive ways girl-boy-boy, 
boy -girl-boy, boy-boy -girl. The probabilities of these three are i.J.f, 
i > i * i> i i i respectively. The required probability is their sum, which is 
13/32. 


9. Probability distributions. Expected value 

Suppose that, corresponding to the n exhaustive and mutually 

exclusive cases that may result from a trial, a variable x assumes the 

n values with corresponding probabilities p^. Then the assemblage 

of values x^, with their probabilities constitutes the probability 

distribution of the variable for that trial. A variable, which possesses 

a probability distribution, is often called a variate. Most of the 

concepts introduced in connection with frequency distributions are 

equally applicable to probability distributions. Thus the rth 

moment, of the distribution about the value a: » 0 is defined by 

the equation ^ ^ ^ 

(3) 

i 

probability taking the place of relative frequency since, as proved 
in § 8, Spf SI 1. In particular, the first moment about a; b. 0 is 

i 

Just as in the case of frequency this moment is called the mean value 
of the variate, or of the distribution. More commonly, however, it 
is referred to as the expected value or expectation of the variate. It 
is denoted by E{x), or sometimes by 3, though the former is prefer- 
able. Thus 

E{x) = (5) 
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Let ^(x) be a function of the variate x. This assumes the value 
}fr{Xf) when x assumes the value Xf ; and P{ is therefore the probability 
of this value of the function. The expected value of the function is 
thus given by 

E[^(x)] = j:Piir(Xt). ( 6 ) 

i 

In particular (3) expresses that is the expected value of the rth 
power of the variate. 

The rth moment about the mean of the probability distribution 
is defined by a formula corresponding to § 4 (12). Thus 

/t, = 2 PiiXi -xY = E{[x- .E(a:)]0. (7) 

In particular, the second moment about the mean is the variance of 
the distribution; and its positive square root is the standard devia- 
tion, O’. The relation 

(8) 

holds as before. It may be expressed in the alternative notation 

Eiix - fi?(®)]2) = E{x*) - [E{x)y>. (8') 

The relations between Pf and are the same as for a frequency 
distribution; while, corresponding to §2 (6), we have the identity 

E[x-E{x)] = 0. (9) 

In words, tiie expected value of the deviation of a variate from its mean 
is zero. 

Example 1. What is the expected value of the number of points that will 
be obtained in a single throw with an ordinary die? 

Here the variate is the number of points showing. It assumes the values 
1, 2, ...9 6 with probability 1/6 in each case. Hence 

E(x) = (1 + 2 + ... + 6)/6 = 7/2 = 3-5. 

Example 2. From an um, containing 3 red balls and 2 white, a man is to 
draw two balls at random without replacement, being promised 20s. for each 
red ball he draws, and 10s. for each white one. Find his expectation. 

For the possible results of the drawing there are three exhaustive and 
mutually exclusive cases, viz. 2 red balls, 1 red and 1 white, and 2 white. 
The corresponding probabilitieB are easily shown to be 3/10, 3/5 and 1/10. 
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The variable ie the amount to be given to the man on the tesult of the draw; 
and this has the value 40s. in the first case, 30s in the second, and 20s. in the 
tiriid. The expected value of the amount to be given is therefore 

(Ax40+f x30t^x20)s. s 32s. 

Examjde 8. Show that, if c is constant, 

J!;(cs;) = eE(x), E[c^{x)} = 

iO. Expected value of a sum or a product of two variates 
Let X and y be two vaiiates’,, the first of whidi assumes the tn values 
Xf with probabilities pt (» » 1, 2, m), and t^e second assumes the 

n values y^ with probabilities {j — 1,2 n). The sum x+y can 

assume the mn values of + y^, since any of the m values of » may 
be associated with any of the n values of j. Let Pfj denote the 
probability that x assumes the value x^ and, at the same time, y 
assumes the value yj. Then by (6) 

S !><#(*< +y#) 

i-H-l 

= SSi>«»<+SSj>«y# 

i i ii 

For '^Pth being the sum of the probabilities that x assumes the 

i 

value Xi while y assumes one of the values Pi, yg, . . ., is equal to 

P{. Similarly 2 Piy s p^. Consequently 

E{x + y) * E(x) + F(y). (10) 

Thus the expected value of the awn of two variatea ia equal to the aum 
of their expected val/uee; and the theorem may be extended to the 
sum of any number of vamtes. 

The variates, x and y, are said to be independent* if the probability 

* Or, Since this is the kind of independence with 

which we ore chiefly concerned in these pages, the adverb ‘statistically ’ will 
usually be omitted. Statistical independence has been described by Aitken 
(1939, 1, p. 148) as ‘obedience to the multiplioation theorem of probability *. 
This will be apparent to the student after he has read §§ 12 and 31 below. 
Statistical independence should not be confused with/unc^umot independ- 
ence. The variables x and y are functionally dependent when there exists a 
functional relation F(x, y) = 0 which holds identically. They are funoticmi^y 
indep^dent if no such relation exists. 
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that either of them will assume a prescribed value does not depend 
on the value assumed by the other. We proceed to examine the 
expected value of the product, xy, of two independent variates. In 
the above notation this product may assume the mn mutually 
exclusive values x^yj. The probability that the product vrill assume 
a particular value, x^yj, isp^p^, being the product of the probabilities 
that X will assume the value and y the value y^. Hence the 
expected value of the product is given by 

m n 

= S S PiPiOUtVi- 

i-li-l 

Performing first the summation with respect to j, and then that 
with respect to i, we have 

S!{nDy) = 'LPiXtEiy) = E{y)'ZPiXi 
i i 

=^E{x).E(y). (11) 

Thus the expected value of the product of two independent variates is 
equal to the product of their expected values. 

If the mean of either of the independent variates, say x, is taken 
as origin for that variate, then E{x) = 0, and consequently E(xy) = 0. 
In particular this relation holds when both the variates are mea- 
sured from their means. The expected value of the product of the 
deviations of the two variates from their means is called their 
covariance. Thus Ote covariance of tvoo independent variates is equal 
to zero. From this we may deduce the important theorem: 

The variance of the sum of two independent variates is equal to the 
sum of their variances. 

If tile two variates are measured from their means the variance 
of their sum, being the expected value of {x+y)'‘, is given by 

E{a^ + 2*y + y*) « E{x*) + E{y*), 

and is therefore equal to the sum of the variances of x and y. The 
theorem may dearly be extended to the sum of any number of 
variates which are independent in pairs. 
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11. Repeated trials. Binomial distribution 

Suppose that a trial is repeated, so that we have a series of n 
trials. The happening of the event A as the result of a trial will be 
called a success. We consider first the case in which the trials are 
independent, and the probability p of success is the same for each 
trial. Then the probability of failure is q, where p+q^ 1. The 
probability that there will be exactly r successes in the series of n 
trials, and therefore n—r failures, is easily found. For, by the theorem 
of compound probability, the chance of r successes and n—r failures 
in a specified order is But the number of different orders in 


which these successes andfailures may occur is 


, being the number 


of ways of selecting r out of the n positions for the successes. Con- 
sequently, by the theorem of total probability, the chance P of 
exactly r successes in the series of trials is given by 


P = 



( 12 ) 


Thus the probabilities of 0, 1, 2, ..., n successes, in a series of n 
trials for an event of constant probability p, are the respective terms 
of the binomial expansion 


iq+p)* = g"+ng"“*p+...+ngp“~^+p*. 


The probability distribution of the number of successes thus deter- 
mined is called the binomial distribution. Thus the binomial dis- 
tribution is that in which the variate assumes the values 0, 1 , 2 n 

with corresponding probabilities g“, n(f^~^p , ..., p". 

We may show that the expect^ value of the variate in the binomial 
distribution* is np, and its variance npq. In terms of repeated trials, 
the expected value of the number of successes in one trial is p, 
since the variate assumes the values 1 and 0 with probabilities p 
and q respectively. And, since the number of successes in n trials 
is the sum of the numbers of successes in the individual trials, it 
foUows from (10) that the expected value of this number is np. 
Similarly, to find the variance of the distribution we observe that, 


* Other pioo& of these propertieB will be given in { IS, Ex. 8, and } 17. 
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in one trial, the square of the number of successes takes the values 
1 and 0 with probabilities p and q respectively, and the expected 
value of in one trial is therefore p. Hence the variance of the 
number of successes in one trial is 

E{pfl) — [E(x)\^ ^pq. 

And since the number of successes in n trials is the sum of the num- 
bers in the individual trials, and these trials are independent, the 
variance of the total number of successes is the sum of the variances 
for the separate trials, and is therefore npq. The s.D. of the number 
of successes in n trials is ^{npq); and the s.D. of the proportion of 
successes in the series is therefore ^{pq/n). 

Example 1. Find the mmt probable number of successes in the above series 
of trials, that is to say, the number of successes which has a greater prob- 
ability than any other. 

The chance of r + 1 successes will be greater than that of r if 


that is, if 


n— rp 

r-> 

r+lg 


1 , 


(fi+l)p>r-»-l. 


Hence the most probable number of successes is the integral part of (n + l)p. 
If, however, (n + 1 ) p is an integer, r + 1, t he chance of r + 1 successes is equal 
to that of r successes, and is greater than that of any other number. Thus, if 
p = 2/5, the most probable number of successes in 20 trials is the integral 
part of 21 X 2/5, which is 8. In 24 trials, however, 9 and 10 are the most 
probable numbers. 

Example 2. A's chance of winning a game against B is 2/3. Find his 
chance of winning at least three games out of five. 

This chance is the sum of the probabilities that A will win 3, 4, and 5 games, 
and is therefore 


+ 






243’ 


Example 3. Poieson'e series of trials is a series of n trials in which the 

probabilities of success are Pi, P|, p,, ...» Pn respectively. Show that the 

n 

expected value of the number of successes is S p^, and the variance Z p^g^. 

{-1 i 

By the same argument as above, the expected value of the number of 
successes at the tth trial is p^, and its variance p^g^. Hence the result. 

Example 4. Verify from (7) that a distribution, in which the variate takes 
the values 1 and 0 with probabilities p and g respectively, has variance pg. 
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12. Gontlniions probability dlatribntloiia 
As in tiie case of frequency, a continuous distribution is one in 
which the -variable may take any value between certain Umits, 
a and b. The number of different values is then infinite, and it is 
useless to speak of the probability of any particular value. Instead 
of this we consider the probability that the value of the variate will 
fall -within a specified interval in the range of the variate. We confine 
our attention to oases in which the probability that the value of the 
variate -wiU fall -within the infinitesimal interval x—^dx to x+^dx 
is e:q)ressible in the form ^(x) dx, where ^(x) is a continuous function 
of X, called the probability density or the probability function. It is, 
of course, never negative. The continuous curve y » ^{x) is called 
the prebabUity curve; and, when this is symmetrical, the distribu- 
tion is said to be symmetrical. The area under the curve from a; » a 
io X — fi represents the probability that the value of x -will fall 
within the interval a to The total area under the curve is unity. 
Thus 


/: 


^(x)dx = 1 . 


(13) 


When ^{xy is constant, the vsuiate is said to have a uniform or 
rectangular distribution of probability. The value of the constant 
must be 1/(6 — a), in virtue of (13). 

The rth moment of the distribution about a particular value is 
the rth moment of the probability- about that value. Since ft(x) dx 
is the probability corre^onding to the interval dx, the rth moment 
about a; a 0 is given by 

X=>J 3f^{x)dx. (14) 

In particular, the expected value of x, being the first moment about 
a; » 0, is 

E{x) S3 = J x^{x)dx, (16) 

and the expected value of the function ^(x) is 

E[iffr{x)] j ^{x)^{x)dx. (16) 
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The rth moment about the mean of the distributioa ia 


whidi is the expected value of the rth power of ^ deviation of x 
from the mean. The variance of x, being the second moment about 
the mean, is given by 

/ti = J [05 — E{x)]*^{x) dx 

or <r* = J fls*^(a5)<2a5— [J?(fls)]*. (18) 

The theorems of § 10 are equally true for continuous distributions. 
Let tire values of x range from a to 6, and those of a second variate 
y from c to d. We may assume that the probability tiiat x falls in the 
interval dx and, at the same time, y falls in the interval dy is jointly 
proportional to dx and dy, and expressible in the form ^{x, y) dxdy, 
where ^{x, y) is a continuous function of the two variates. Then the 
expected value of the sum of the variates is 


n d 

{x+y)^{x,y)dxdy 

n d rb ^d 

x^{x,y)dedy+\ y^{x,y)dxdy. 
9 Jajc 


In the first of these two integrals let the integration with respect to 
y be performed first. Then J ^{x,y)dxdy is the probability that x 
will fall in the interval dx irrespective of the value of y. Denote it 
by ^x(x) dx. Similarly, in the second integral, J ^{x, y) dxdy is the 

probability that y will fall in the interval dy irrespective of the value 
of X. Denote it by ^g(y) dy. Then 


E{x+y) = J*a5^i(a5)d*+|^ y^»(if)dy 

= E(x) + E{y), (19) 

as required. 

In nnnaMAring the expected value of the product xy we assume 
that the variates are independent. Then the probability that the 
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value of X falls in the interval dx is independent of the value of y, 
and may be expressed as ^i{x)dx. Similarly, the probability that 
y falls in the interval dy is independent of the value of x, and is 
expressible in the form <j>i{y)dy. Therefore the probability that x 
falls in the interval dx and, at the same time, y falls in the interval 
dy is i>x(x)^^{y)dxdy. Accordingly, the expected value of the 
product xy is 

=11 xy^i(x)^t(y)dxdy 

J aje 

= f *^i(*)da:f y^t{y)dy = E{x).E(y). (20) 

Ja J0 

Thus the expected value of the product of two independent variates 
is equal to the product of their expected values. And it follows, as 
in § 10, that the covariance of two independent variates is equal 
to zero. 

Example 1. Show that, if the variable x has uniform distribution of 
probability over the range —a to +a, then ^(a?) = l/2a, « = 0, o** = 
the moments of odd order about a? = 0 are zero, and that of order 2n is 
a**»/(2n+ 1). 

Example 2. Through a point B on the y-axis, whose ordinate is positive 
and equal to^a, a straight line is drawn in a direction taken at random in the 
interval ^ to 6^ = 0 being the inclination of the line to BO, Examine 

the probability distribution of the intercept x on the x-axis. 

We interpret the data as meaning that the variable 0 has uniform distribu- 
tion of probability in the interval — to \n. Hence the probability that 0 
will fall in the interval dd is 2d0ln, Now the intercept on the x-axis has the 
value X = a tan 0, so that 0 = arc ton x/a, and therefore 

= adx/(o*-f X*). 

But, when 6 falls in the interval dO^ x falls in the corresponding interval dx. 
Hence the probability density for the distribution of x is 

7r(a*+x*)* 

This is also the relative frequency density of the distribution in f 6, Ex. 2, 
and the properties of the distributions are the same. There is symmetry 
about X = 0, and the s.n. is 0*62a. . 

13. Theorems of Tchebychef and Bernoulli 
Consider first a theorem due to Tchebychef. Let x be a variate, 
either discrete or continuous, with s.D. a. The theorem to be proved 
fixes an upper limit to the probability that a value of the variate. 
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chosen at random, will differ from the mean bj more than Acr, 
where A is a given positive number. Tchebychef'a theorem may be 
stated: 

In a random choice of a value of a variate, whose standard deviation 
is (T, the probability that (he value chosen vnll differ from the mean by 
more than Ao* does not exceed 1 /A*. 

To prove it we observe that the variance <r* is the second moment 
of the probability of the whole distribution about the mean. Now, 
if the combined probability P of values further than A<r from the 
mean were greater than 1/A^, the second moment of the probability 
of these values alone would exceed (A(r)®/A®, i.e. <r®, and that of the 
whole distribution would, a fortiori, be greater than cr‘. Since this 
is not so the statement in the theorem must be true. Thus 

( 21 ) 

In particular the probability that a value of the variate will differ 
from the mean by more than 3(r does not exceed 1/9. The theorem 
is a very conservative one, since the actual value of P is usually 
very much less than 1/A*. The result, however, applies to all distri- 
butions; and we shall now use it to prove a theorem due to 
Bernoulli. 

Liet m be the number of successes obtained in n independent 
trials, in which the constant probability of the occurrence of the 
event Ai&p. The quotient mjn is the relative frequency of successes. 
How does this quotient behave as n increases indefinitely? It is a 
matter of common knowledge that, in trials of this nature in which 
the value of p is known, the relative frequency obtained is usually a 
close approximation to p when n is large. But there is no proof, 
based on the measure of probability given in § 7, that the relative 
frequency of successes will have p as limiting value when n tends to 
infinity. James Bernoulli, however, proved that, given any positive 
number e however small, the probability of | m/n —p | exceeding e 
tends to zero as n tends to infinity. This is expressed in modem 
tenninology by 8a3dng that m/n converges in probabiUty to p as n 
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tends to infinity. SemonUi*» theorem may alM be expressed in the 
form: 

Let e and ij be two given positive nutnbers, however maU, and let m 
be the number of aueceeeee in n independent triala, in which the constant 
probabUUy of sttcceaa ia p. Then the probabiUty that the inequality 


m 

p 

n ^ 


<e 


( 22 ) 


tvill hold is greater than l — ij, provided that n ia greater than a certain 
number N, depending on e and- y. 

A simple proof may be given as follows. It was shown in § 1 1 that 
the mean value of the relative frequency of successes in such a 
series of trials is p, and its variance a* is pq/n. If then we write 
e her, the condition 


is the condition that the relative frequency of successes should 
differ from its mean by more than A<r. But, by Tchebychers theorem, 
the probability of this does not exceed 1/A*, so that 




O’* 

e* 


« 6 *’ 


and this is less than q for all values of n greater than pqlije*. Hence 
Bernoulli’s theorem. 

The theorem does not assert that the inequality (22) must hold 
for all values of n greater than N, but that the probability of its not 
holding is less than y. Even this probability, however small, leaves 
room for the possibility that it may not hold on some particular 
oepasion. Bernoulli’s theorem explains the practice of taking mjn, 
for a large value of n, as an approidmation to the value of p. Indeed, 
it frequently happens that this .use of relative frequency is our only 
means of estimating the probability of the event. 


14. Empirical definition probability 
As already indicated the definition of probability in terms of 
equally likety oases does not lend itself to every instance in which 
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a numeiioid eyalaation of probability is desired. Another definition 
of the probability of an event is sometimes given in terms of the 
relative fireqnency of the ooonrrenoe of the event in an extended 
series of trials. The fundamental assumption for such a definition 
is that this relative firequeni^, in a uniform series of trials, tends to 
a definite limit as the number of trials in the series tends to infinity; 
and this limit is taken as the measure of the probability of the 
occurrence of the event in another such trial. Xo proof can be given 
that the above relative frequency does tend to a limit. Convergence 
in probability, mentioned in connection with Bernoulli’s theorem, 
is a consequence of the original definition, and is not the same 
thing as the convergence to a limit assumed in the empirical 
definition. But, though the two approaches are not theoretically 
equivalent, they may be regarded as in agreement for practical 
purposes. 

By means of the assumption on which the empirical definition is 
based, the laws of addition and multiplication of probabilities can 
be deduced. Suppose, as in § 8, that a trial may result in any one of 
the mutually exclusive events A^. In a series of n trials let be 
the number of times in which the event A^ happens. Then the prob- 
ability Pi of the happening of this event is given by 


Pi 


- lim 


Since the number of times in which one of the events 

k 

happens is S the probability of the happening of one of these 

i-i 

events is 

{-1 n 

and we have the theorem of total probability as in § 8. 

Suppose next that A and B are dUBferent events that may happen 
as the results of specified trials T and T' respectively. We require 
the probability that, in a pair of such trials, both events will happen. 
In a series of n pairs of trials let m be the number of times in which A 
happens, andm^ the number of times in which both and £ happen. 
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These occasioiui are all included in the m occasions on which A 
happens. Then the probability of the happening of both events is 




-lim^Li 

n -^00 ^ 


» lim 

n-*‘ 00 


\n tn/ \m-*a} 



since m tends to infinity with n, if the probability of ^ is not zero. 
Now lim m/n is the probability of A, and hmmjm is the con- 
ditional probability of B on the assumption that A has happened. 
We thus have the theorem of compound probability as in § 8. In 
the particular case of independence the probability of the occurrence 
of the two events is simply the product of the probabilities of the 
separate events. As we have already seen, the theorems of total and 
compound probability are the foundations of the mathematical 
theory. The measiire of probability by relative frequency is also 
fimdamental. In the a priori definition it is the relative frequency 
of favourable cases in the total number of cases, while, in the 
empirical definition, it is the limit of the relative frequency of the 
happenings of the event under consideration. 


15. Moment generating function and characteristic function 
Let <j>(x) be the probability density in the distribution of the 
variate x. The expected value of is a function of t given by 

M(t) = j ef^^{x)dx, (23) 

where the integration is taken over the whole range of x. When this 
integral has a meaning for a certain range of values of t, we may 
expand the exponential and integrate term by term, thus obtaining 
the formula 

where /<' = Jaf(^(a:)da:, 

being the moment of order r about the origin (« = 0). For this 
reason the function M{t), defined by (23), is called the moment 
gerurating function (m.g.f.) of the distribution about the value 
' X <■ 0. Similarly the m.g.f. about the value x^a \b defined 
as the expected value of ezp[f(x—a)]. Denoting this by MJf) 
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we lutre then 

Ma{t) " Jesp W» - a)] i>{x) dx, (26) 

provided the integral has a meaning; and the expansion of the 
exponential, and term by term integration, then show that the 
coefficient of r/rl is the moment of order r about the value a; » a. 
Since the factor e~^ is independent of x, it follows from (26) that 

M,{t) = (26) 

the subscript indicating the value with respect to which the m.g.f. 
is constructed. 

When the variate x takes only the discrete values Xi with prob- 
abilities Pi (i = 1,2, ...,n), the m.g.f. with respect to the value 
x = a, being the expected value of exp \t{x — o)], is given by 

MJd) = S 2>< exp [t(Xi - o)] = c-“<ifo(0 (27 ) 

i 

as before; and the coefficient of r/r ! in the expansion is the moment 
of order r about the value in question. The m.g.f. will be found useful, 
not only in calculating the moments of a distribution, but also in 
leading to concise proofs of various theorems. Along with it may 
be mentioned the diaracteristic function* (c.f.), E(e^), which is the 
expected value of c**, where » = ^ ~ ^ ^ 

Example 1. For the exponential distribiUion defined by ^(x) = ce""®*, in 
which c is positive and x varies from 0 to oo» the m.g.f. with respect to the 
origin is 

rao 

M(t) = c\ expltx^cx)dx (|<|<c), 

, . , / 1 / 2! , r\ 

showing that /h = “» /^ = = -7- 

C C* C' 

Thus the mean is 1/c, and the variance is given by 

and the 8.D. by o’ = 1/c. 

♦ Cf. L6vy, 1926, 3, part n, chapters 11 and in; Cramer, 1937, 6, 
pp. 23-68; and Kendall, 1943, 2, pp. 90-100. 
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Example 2. For the rectangulair dUlribution, ^[x ) » l/2a. 
m have 

e^-6”** sinhcrf 

erax * ! 


MS) = 




2at ai 

« 1 + 0 V/ 3 1 +a«<*/6!+oV/7 ! + .... 

The moments of odd order are ssero, and = a^l{2r-^ 1). 

Example 3. For the Htumial dielrib'ulion the m.g.f. with respect to the 
origin is^ in virtue of (27), 

M{t) = 3*-V + ... 

= (g+peO* = (1 +p/+p«V2!+p«*/3 1 + 

The mean, being the coefficient of t in the expansion, is np. Similarly 
being the coefficient of f*/2 1, has the value 

np + ( 2 ) 2 Ip* = »»p[l + (n - l)p]. 

Consequently the variance is given by 

Pi = pi - (Pi')* = «P(1 -p) = rm- 


A very important property of the m.g.f. is expressed in the 
theorem: 

The moment generating function of the mm of txoo independent 
va/riatea ia the product of their moment generating functiona. 

This is a direct consequence of the theorem concerning the 
exxweted value of the product of two independent variates. For, if 
X and y are independent variates, the m.g.f. of their sum with respect 
to the origin is 

^(e^i*+*>) B . ef«) = E{ef*) E(efv), 


and is therefore the product of their m.g.f.*s. And, siuce the origin 
may be chosen at pleasure, the theorem holds for the m.g.f.*s about 
any specified value. 

Let /if, /if denote the moments of x about the mean and the origin, 
and mf, m, those of y. Then, by the above theorem, the m.g.f. of 
x+y has an expansion obtained from the product 

(l+/i{t+/iit*l2\+/i^fil3\ + ...){l+m’it+m^t*J2l + 

The coefficient of t in this product is pj + tni, so that the mean of the 
sun of the variates is the sum of their means. From the coefficient 
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of t*l2\ we find that the second moment of x+y about the origin is 
Consequently the second moment of x+y about 

its mean is 

and is thus equal to the sum of the variances of x and y. Since the 
variance of — y is equal to that of y, we have again the theorem: 

The variance of the aunt {or difference) of tuoo independent variates 
is equal to the sum of their variances. 

From their definitions it is clear that the moment generating 
function, when it exists, and the characteristic function, which 
always exists, are determined by the probability density ^{x) 
of the distribution. Conversely, the distribution is determined 
uniquely by its characteristic fimction,* or by the moment gene- 
rating function when the moments satisfy certain conditions ;t 
that is to say, variates which have the same moment generating 
function conform to the same distribution. Proofs of these state- 
ments are beyond the scope of this book. In the three cases in 
which we shall make use of this converse theorem (pp. 49, 67-8 
and 151) the moments of the distributions satisfy the necessary 
conditions. 

16. Cumulative function of a distribution 
If the logarithm of the m.g.f. of a distribution can be expanded 
as a convergent series in powers of t, viz. 

K{t) = logif(<) 

Kit + K^t*l2\+KgfilV. + . (28) 

the coefficients, k^, are called the amulants (or seminvariantsX) of 
the distribution, and K{t) is the cumulative function. The cumulants 

* For a proof of this converse theorem see L4vy, 1925, 3, pp. 166-7; 
also Deltheil, Etreura et Moindrea Carria, pp. 26-9 (Faso. 2, Tome 1 of 
Traitt du Ccdeul dea ProbabHUia et de aea ApjdifxUwma, ed. E. Botel), Qauthier- 
Villars, Paris, 1930, and Kendall, 1943, 2, pp. 90-4. 
t Cf. KendaU. 1943, 2, pp. 105-10. 

j We adopt Fisher’s terminology of oumubmto (1929, 1) rather than 
Thiele’s of atmAwanamta ( 1 903, 1 ), since Dressel has shown that the oumulanta 
are only one particular set of seminvariants (Atm. Math. Slot. vol. xi, pp. 
83-67, 1940). 
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axe dfitenninate functions of the moments. For instance, on taking 
logarithms of both members of (24) and identifying with (28). we 
see that and is thus equal to the mean of the distribution. 

Since, by (26), the m.g.f. with respect to the mean is 

it is clear on taking logarithms that the cumulative function with 
respect to the mean differs from that with respect to a; = 0 only 
by the addition of the term Consequently the cumulative 

function relative to the mean is 

K2ty2 ! + Kgtys ! + ... + /Cff/r ! + .... (29) 

And since this must be identical with 

log (1 +/<a<®/2 ! ! + ...), 

we see, on comparing coefficients of like powers of t, that the first 
few cumulants are given in terms of the moments of the distribution 
bv the formulae 

— Ml — Diean, 

^2 ~ ^3 — Ms’ ^4 “ Mi ~ ( 30 ) 

Thus aU the cumulants after the first are independent of the value 
with respect to which the cumulative function is constructed. The 
mean, and the moments about the mean, may therefore be found by 
calculating the cumulative function with respect to any convenient 
origin. Also, from the last of the relations (30), we have 

^^ = ^-3 = ^ 8 - 3 , ( 31 ) 

/4 

and this is the eoccess of kurtosis, as defined at the close of § 6. 

Example. Since by § 15, Ex, 1, the m.g.f. of the exponential distribution, 
with respect to the origin, is 

the cumulative function is 

K(t) -log(l-<rt) = (7t+cr*<*/2+(r*«*/3 + ..„ 

Thus ic^ = cr»'(r-l)J. 

The mean, being the coefficient of U is cr. The variance is c**, = 2(7*, and 

=5 + = (3! + 3)<r* s= 9<r*. 
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Farther, since the m.g.f. of the sum of two independent variates, 
X and y, is equal to the product of their m.g.f.’s, it follows that the 
cumulative fimction of the distribution of x +2/ is the sum of those 
of X and y. Equating coefficients of like powers of t, we have the 
simple result that the rth cumulant of x+y is the sum of the rth 
cumulants of x and y. This is the additive property of eumidants. The 
theorem is obviously true for the sum of any finite number of inde- 
pendentvariates. The particular case of second cumulants gives again 
thetheoremonthevarianceofthesumof several independentvariates. 

The above property of cumulants may be used to estimate the 
average corrections* to be applied to the moments of a grouped 
distribution, with specified class interval c, when the interval-mesh 
is located at random on the ungrouped distribution. The corrections 
foimd, which are of the same form as Sheppard's adjustments, may 
be wrong in any individual instance, but their average effect in a 
large number of cases will be correct. In any class the mid-value x^, 
from which the moments of the grouped distribution are calculated, 
is the sum of the true value x of the observation and the grouping 
error x^— x. In consequence of the random location of the class 
limits, the grouping error is uniformly distributed over the range 
— Jc to ic. The average cumulants of the grouped distribution 
will .differ from those of the ungrouped by the cumulants of the 
grouping error. Now the m.g.f. for the uniform distribution of 
error is 

Jf(<) = - sinh(JcO 

-1 

and its cumulative function is therefore 

jfm _ c* c* 

122! 1204! 2526! 

If then are the cumulants of the grouped distribution, those of 
the ungrouped distribution are 

* Cf. Cornish and Fisher, 1937, 7, pp. 3-4, and Kendall, 1943, 2, pp. 74-6. 
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Denoting the moments of the grouped distribution by and 
we therefore hare for those of the ungrouped distribution 

c* 

(32) 

and + + 

7^4 

= »»4-ic*OT,+245- (33) 


The equation (32) shows that the estimate m\ of the mean, and the 
estimate mg of the third moment about the mean, as found from the 
grouped distribution, are sufSciently accurate. The calculated 
variance mg should be diminished by 0 ^ 12 ; while the adjustment to 
the fourth moment is given by (33). 
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EXAMPLES n 

. 1. Two cards are drawn at random from a well-shuffled pack of 
62. Show that the chance of drawing two aces is 1/221. 

2. The chance of throwing a 6 at least once in two throws of a 
die is 11/36. 

3. 4 nnd B toss a coin alternately on the understanding that 
the first to obtain heads wins the toss. Show that their respective 
chances of winning are 2/3 and 1/3. 

4. Four persons are chosen at random from a group containing 
8 men, 2 women and 4 children. The chance that exactly two of 
t|iem wUl be children is 10/21. 
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5. From ui am oontaining r red and s white balls, a*f 6 balls ate 
drawn at random without replacement (a < r , 6 < s). Show that the 

probability of o red and b white balls is (a +6) ’ 

6. Show that, in a single throw with two dice, the chance of 
throwing more than 7 is equal to that of throwing less than 7, 
each being 5/12. 

7. A and B take toms in throwing two dice, the first to throw 9 
being awarded the prize. Show that their chances of winning are 
in the ratio 9 : 8. 

8. Three men toss in succession for a prize to be given to the one 
who first obtains heads. Show that their chances of winning are 
4/7, 2/7 and 1/7. 

9. Eight coins are thrown simultaneously. Show that the dumce 
of obtaining at least siz heads is 37/256. 

10. The expectation of the number of failures preceding the first 
success in an indefinite series of independent trials, with constant 
probability p of success, is 

2P + 2g*p + 3?»i) + ... = 

(Uspensky, 1937, 2, p. 178, Ex. 3.) 

11. A point P is taken at random in a line AB, of length 2a, all 
positions of the point being equally likely. Show that the expected 
value of the area of the rectangle AP . PB is 20^3, and that the 
probability of the area exceeding is 1/^2. 

12. From a point on the circumference of a circle of radius a, 
a chord is drawn in a random direction (i.e. all directions are equally 
likely). Show that the expected value of the length of the chord 
is 4a/w, and that the variance of the length is 2a‘(l — 8/7r*). Also 
show that the chance is 1/3 that the length of the chord will exceed 
the length of the side of an equUateral triangle inscribed in the 
circle, 

13. A chord of a circle of radius a is drawn parallel to a given 

straight line, all from the centre of the circle being equally 
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likely. Show that the expected value of the length of the chord is 
\na, and that the variance of the length is ^ (32 -> Sn^). Also show 

that the chance is 1/2 that the length of the chord will exceed the 
length of the side of an equilateral triangle inscribed in the circle. 

14. Two different digits are chosen at random from the set 
1, 2, 3, 8. Show that the probability that the sum of the digits 
will be equal to 5 is the same as the probability that their sum will 
exceed 13, each being 1/14. Also show that the chance of both digits 
exceeding 6 is 3/28. 

15. In Poisson’s distribution the variate takes the values 
0, 1, 2, 3, ... with probabilities proportional to l,m, to®/ 2!, w®/3!, ..., 
both sequences being infinite. Show that the mean of the distribu- 
tion is m, and the variance also m. 

16. Theequation § 15 (26), written for a = x = is equivalent to 
(l-h;tjt*/2!+A«3<*/3! + ...) 

= (1 - ••) (1 - l -/ ti «- l -/ tj «*/21 + 

Equating coefficients of like powers of t, deduce the relations 
§4(17), (18). 

17. Prove that the next two formulae corresponding to § 16 (30) 
are 

= ^ 6 “ ^6 ~ — 10 ^ 1 - 4 - 30 ^|. 

18. Two independent variates are each uniformly distributed 
within the range — a to a. Show that their sum x has a probability 
density given by 

^(*) = {2a + *)/4o* ( — 2a < X < 0), 

^(x) = (2a — x)/4a* (0<x<2a). 

Verify that the m.g.f., calculated from this value of ^(x), is equal 
to 

19. On the x-axis n-(- 1 points are taken independently between 
the origin and x « 1, all positions being equally likely. Show that 
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the probability that the (&+ l)th of these points, oounted firam the 
origin, lies in the interval a;- to is 

(i) (1 -*)“-* d®. 

Verify that the integral of this expression, from ® = Otox = 1, is 
unity. (Aitken, 1939, 1, p. 71.) 

20. Show that, for the binomial distribution, 

Kt = rtpq{\-Qpq). 

21 . Show that the expected value of the product of the numbers 
of points showing after an unbiased throw of n ordinary diceis(7/2)“, 

22. Show that, if p may be varied, the probability of m 
successes in a series of n independent trials, with the same prob- 
ability p of success, is greatest when p min, 

23. Show that, if y and z are independent random values of a 
variate x, the expected value of (.v— »)* is twice the variance of the 
distribution of x. 

24. Defining the harmonic mean (h.m.) of a variate x as the 

reciprocal of the expected value of l/x, show that the h.m. of the 
variate which ranges from 0 to oo with probability density ! 

is n, given that » is positive. 
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SOME STANDARD DISTRIBUTIONS 
Bzkomial and Poissonian Distributions 

17. The binomial distribution 

\ 

We have considered the binomial distribution in connection 
with the probabilities of the various numbers of successes in a 
series of n independent trials, in each of which the chance of success 
is equal to p. The mean and the variance of the distribution were 
determmed by means of the property that the expected value of a 
sum of variates is equal to the sum of their expected values. These 
may also be found by direct calculation. Thus 

E(x) * 0.g"+l.n3““^p+2.^2jg*~*p*+...+np" 

— (»— 2 ... 
e»np(g+p)"~® » np. (1) 

Thus the mean of the distribution is np. In order to find the variance 
calculate first the second moment about a; = 0. Thus 

m j^g«-»+2(»- l)g"-*p + 3^” 2 g^p*+ ... . 

Now the expression in brackets in the first moment, about the value 
» » — 1, for the binomial distribution in which n is replaced by 
n— 1. This first moment, being the excess of the mean of the dis- 
tribution above » = — 1, is equal to (n— l)p+ 1, in virtue of (1). 
(consequently 

p; = np[(n-l)p+l], 

and the variance of the binomial distribution is then given by 

/t, - p; - [E(x)]* « flp[l + (» - l)p] - (np)* 

«if^(l-p)«fipgr. (2) 
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and the standard deviation by 

<r - ^{npq). (3) 

The proportion or rdative frequency of meeeaaea is the number of 
successes divided by n. Hence the mean value of this proportion 
is j>, and its standard deviation is -f(pqjn). 

A binomial frequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, n of the variate are equal to 
their pl^babilities in the above distribution. As an example we may 
take the distribution of expected frequencies of 0, 1, » successes 
when the set of n trials is to be made N times. For, in virtue of (1) 
and the known probability of r successes, the expected frequency of 

r successes in N sets of n trials each is .AT j The properties 

of the distribution are the same whether it is regarded as one of 
probability or one of frequency. But the reader may note that, in 
a theoretical frequency distribution like the above, the individual 
firequencies are not necessarily integral. 

Exan^fde 1. Verify the above value of ^ by direct algebraical simpli* 
fioation of the expression. 

Examjile 2. Show that the m.g.f. of the binomial distribution with respect 
to its mean is 

t t* «* **1" 

and deduce that 

= nMl + 3(n-2)pj]. 


18. Poisson’s distribution 

An important distribution, associated with the name of Poisson, 
is one obtainable from the binomial distribution by putting p » m/m, 
where m is a constant, and letting m increase indefinitely. Thus the 
number of trials in the series becomes very large, and the prob- 
ability of success in a trial very small. Now it can be shown* that, 
on the above assumption as n tends to infinity. 


lim 





r! 


* Sea Mathematical Note X, ^the end of this chapter. 
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Thus, in the limiting fonn of the distribution, the probability of r 
suooesses in the infinite series of trials is m*e-^lr\. The chances of 
0, 1, 2, 3, ... successes in the infinite series are 

e“**, e~^, ••• (4) 

respectively. The reader may show, as in the case of the binomial 
distribution, that the most probable number of suooesses is the 
integral part of m. It is obvious that the sum of the probabilities 
of 0, 1, 2, ... successes is = 1, as it should be. 

The mean value and the variance of Poisson’s distribution may be 
deduced fix>m those of the binomial by putting p = m/n, and 
letting » tend to infinity. Thus the mean mine is limnp = m. 
Similarly, the variance is )amnpq = limm; = m, since q tends to 
unity as p tends to zero. These results may, of course, be deduced 
by direct calculation. Thus the mean, being the first moment about 
£ s 0, is given by 

*=/ti = e-«»(0+l.w + 2.m*/2! + 3.m»/3! + ...) 

= f»e~™(l+m+m®/2! + m®/3! + ...) 

= TO. (6) 

Similarly ' 

/tj = e“’"(0*+l®.»» + 2®.jnV2! + 3®.»»*/3! + ...) 

= m(m+l), 

as the reader may easily verify. Consequently 

O’® = /t, = ** = m(m+ l)-m* 

= m, (6) 

as found above. The s.n. is therefore ^m. 

The same results may be obtained by using the generating 
functions of §§16 and 16. Thus the m.g.f. of the Poissonian dis- 
tribution with respect to the origin is 

Ji;(0 * e-«*(e'“-|-c^+e«m»/2!-He«m«/3!-H...) 
s e“®'exp (me*) = exp [m(e‘— 1)], 

and the cumulative function is therefore 

K{t) - m(e^- 1 ) - m(t-l-i*/2! + • •). 
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Thus «i #r, OB 1C3 » ... « m. The mean and the Tarianoe are each 
equal to m, as is also the third moment about the mean. The fourth 
moment about the mean is 

/^4 “ ^4+3#^ * m+3m*. 

Further, if n independent variates conform to Poissonian 

distributions with means {i - 1,2, it follows from the 

theorem of § 16 that the m.g.f. of their sum is exp 1) • 

But this is the m.g.f. of a Poissonian distribution whose mean is 
S Hence the theorem: 

The sum of any finite number of independent Poissonian variates 
is itself a Poissonian variate^ with mean eqital to the sum of the means 
of the separate variates.* 

A Poissonian frequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, 3, ... of the variate are equal to the 
probabilities in the above distribution. As an example we have the 
distribution of the expected frequencies of the various numbers of 
successes when the extensive series of trials is repeated N times. 
But in such a theoretical distribution the individual frequencies 
are not necessarily integral. Frequency distributions which are 
approximately Poissonian do arise in connection with the number 
of happenings of a rare event in an extensive series of trials. 

Example. In 1 ,000 consecutive issues of The Utopian Seven^ily Chronicle 
the deaths of centenarians were recorded,! the number x having frequency 
f according to the table 

a;:0 1 2 345678 

/: 229 325 267 119 60 17 2 1 0 

Show that the distribution is roughly Poissonian by calculating its mean 
{m = 1*6), and then the frequencies in the Poissonian distribution with the 
same mean and the same total frequency of 1,000. The latter are approxi- 
mately 223*1, 334*7. 251*0, 125*6, 47*1, 14*1, 3*5, 0*8, 0*2. Also calculate the 
varimoe of the given distribution, and compare it with the mean. 

* An elementary proof of this theorem will be found in Ex. 4 at the end 
of this chapter. 

t Cf. Lucy Whittaker, Biometrika, vol. x, 1914, p. 36. 
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Standard DistribuHow 

' Thb Normal Distbibittion 


[m 


19. Derivation from the binomial distribution 
The binomial and Poissonian are distributions of discrete values. 
We pass now to a continuous distribution of fundamental import- 
ance. This is the normal distribiUion, which may be derived from the 
binomial in the following manner. In the latter the probability of 

the value r of the variate is j Let x be the deviation of 

the variate from the mean value np of r, so that 

rssnp+x. ( 7 ) 

Then the probability of the value x, being the same as that of the 
corresponding value of r, is 

It can be shown* that, for large values of n, this probability can be 
expressed , / ** \ 

where £ tends to zero as n tends to infinity, provided that neither 
p nor q is very small, and a; is of lower order than n^. 

Now introduce a variable z defined by 


x 



and examine the probability distribution of z, and its limiting form 
as n tends to infinity. The probability for any interval in the range 
of z is equal to that of the corresponding interval for x. To unit 
interval in the range of x there corresponds the interval 1/^ln in 
that of z; and, when n tends to infinity, this may be denoted by dz. 
Then the probability that z falls in the interval dz is the probability 
that X falls in unit interval which includes the value x; and this is 
given by (8) which, when n tends to infinity, takes the form 

^ For the detuls of this step see Mathematical Note II, at the end of this 
chapter. 
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Thd limitiiig form of tho distribution of z is thus a continuous dis* 
tribution with probability density 

where = pq. This is the normal probability function, and the 
corresponding continuous distribution is the normal distribution. 
Since ^J(npq) is the s.d. of the binomial distribution, and therefore 
of the variate x , it follows that ^(pq) is the s.d. of z. Hence <r in (9) 
is the S.D. of the normal distribution. Since ^{z) is an even function 
of 2 , the distribution is symmetrical about 2 = 0, which is therefore 
the mean value of the distribution. Also, since z must lie between 
— 00 and oo, the integral of (9) between these limits is equal to 
unity, so that 

/-«, ^ "" O’ (10) 

This is an important integral. An independent proof of the formula 
is given in Mathematical Note III, at the end of this chapter. 

A normal frequency distribution is a continuous one, in which 
the relative^requency density f{x) is identical with the function ij>(x) 
defined by (9). A distribution of discrete values cannot be normal. 
If, however, the total frequency is large, it is possible for the discrete 
distribution to approximate to the normal. The meaning of such an 
approximation was explained in § 6. 

In later chapters, particularly in connection with sampling theory, 
we shall meet distributions which are accurately normal, and others 
which are approximately so. Since the normal distribution is an 
ideal to which some distributions attain, and to which others 
approximate, it is important to know its properties. Fortunately, 
the quantities associated with the distribution are easy to calculate. 

20. Some properties of the normal distribution 
As we have just seen, a continuous variate x is normally dis- 
tributed, with mean zero and s.d. a, when the range of the variate 
is from — oo to cx), and the probability density is given by 
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Tfa« jnobabiliQI; oume ia therefore the oorre 



This oiupre is tymmxtrical about the line a; >i 0 through the mean of 
the distribution. It is a uni-modal curve, in which the ordinates 
decrease rapidly as | « | increases. By equating to zero the second 
derivatiye of y, the reader will easily verify that the points of in- 
flexion on the curve are given by « a ±(r. The ordinates of the 
normal curve are given in Table 1, corresponding to values of x/o 
at intervals of 0*01. 



Sinoe tiie distribution is symmetrical, the moments of odd order 
about the mean are all zero. To find the momento of even order we 
observe that the moment of order 2» about the mean is 


Integration by parts then gives 

The expression in square brackets vanishes at both limits; and we 
may therefore write the relation 

/»^-(2n-l)<rVtfc^* 
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Tabu 1. Ordinates the Normal Curve 

The origin of 0 ii at the mean. The table givea the valuea of ^ "*^ 1 ^ ' 

which 10 <T times the ordinate of the normal curve y s ^(x) 


0003 *0002 


0*07 

■ 

■ 

*3980 

•3077 

•3073 

*3032 

*3025 

•3018 

‘3847 

•3836 

•3825 

•3725 

•3712 

•3607 

•3672 

*3555 

•3638 

*3301 

•3372 

■3362 

•3187 

*3166 

•3144 

-2066 

*2043 

•2020 

•2732 

•2700 

•2685 

•2402 

*2468 

•2444 

•2251 

•2227 

•2203 

•2012 

•1089 

•1065 

•1781 

•1758 

•1736 

•1561 

•1539 

•1518 

•1354 

•1334 

•1315 

•1163 

•1145 

•1127 

•0080 

•0073 

•0057 

•0833 

•0818 

•0804 

•0604 

•0681 

•0669 

•0573 

•0562 

•0551 

•0468 

•0450 

•0440 

•0370 

•0371 

•0363 

•0303 

•0207 

•0200 

•0241 

*0235 

•0220 

•0189 

•0184 

•0180 

•0147 

•0143 

•0130 

•0113 

•0110 

•0107 

*0086 

•0084 

*0081 

•0065 

•0063 

•0061 

•0048 

•0047 

•0046 

•0036 

*0035 

•0034 

•0026 

•0025 

•0025 

*0010 

•0018 

•0018 

•0014 

•0013 

•0013 

•0010 

•0009 

*0009 

•0007 

•0007 

•0006 

•0005 

•0005 

•0004 

•0003 

•0003 

•0003 

•0002 

*0002 

•0002 

•0002 

•0001 

•0001 
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Repeated application of this reduction formula shows that 
(2»- 1) (2»- 3) ...3.1 

But, for any distribution, /tg >= 1, and therefore 

/tto = (2»-l)(2»-3),..3.1.o-*» (13) 

In particular, the variance is cr^ and the s.n. ii <r, as proved above. 
Similarly, 

jit = 3<r*. (14) 

The moments may also be obtained from the moment generating 
function of the normal distribution. This function, relative to the 
mmn rr » 0, is 

Mm . 

« exp (i<r*<*), (15) 


in virtue of' (10). Since the expansion of this expression involves 
only even powers of t, the moments of odd order about the mean are 
zero, as is obvious from symmetry. The moment of order 2n is the 
coefficient of fi"l{2n ) ! in the expansion; and this has the value 

/^tn = (2») !/n! = 1 . 3 . 6 ... (2»- l)o-*» 

as found above. The cmnidative function is, in virtue of (15), 

K{t) = log Jf(l) = 

so that all the cumulants after the second are equal to zero. 

The mean deviation from the mean is easily calculated. Its value is 

I \x\«f>{x)dx=‘2\ x^{x)dx 

J-to JO 


in virtue of symmetry; smd this, integral 


'ji 


0’7979(r = fer approximately. 
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In a normal distribution with mean x and s.n. <r, x—x is the 
deviation of the variable firom the mean, and the probability den- 
sity is 

Conversely, a probability density of this form defines a normal 
distribution with mean x and s.n. cr. 

21. Probabilities and relative frequencies for various intervals 
The probability corresponding to any interval in the range of 
the variate is represented by the area under the curve (12) within 
that interval; and the same is true for the relative frequency in^the 
case of a normal frequency distribution. In particular, the prob- 
ability for the interval from the mean (zero) to the value x is given 
by the integral 

Putting t = ay/tr we see that this is equivalent to 

This probability is therefore a function of xj(r. In the accom- 
panying table the values of this integral are given for different 
values of xl<r at intervals of 0*01. 

Using Table 2 the reader will see that, if x[<r — 1, the area is 
0*3413. The area within the interval x = —or to a; = o* is therefore 
0*6826, and the area outside this interval 0*3174, which is less than 
1/3. In other words, the probability that a random value of a normal 
variate will deviate more than cr from the mean, is less than 1/3. 
Similarly, for */o* = 2 the value of the integral (17) is 0*4772, and 
the area for the inteival » = —2ar to x = 2(r is 0*9544. The area 
outside this interval is thus 0*0456. The probability tliat a random 
value of X will deviate more than 2(r from the mean is thus about 
4J %. Similarly, the area outside the range » « — 2*6<r to a; 2’5<r 
is 0*0124, which is about 1^ % of the whole; and that outside the 
range js**— 3<rtoa! = 3<ris only 0*0027, or about J % of the whole. 
The reader may verify, and will find it convenient to remember. 
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tliftt the deviation from tbe mean vlnoh is exceeded witii a prob* 
ability of S% is l*96cr; and that which is exceeded with a 
pgcobability of 1 % is 2*68ct« 


Table 2. Area under (he Nomud Curve 

Tha area i9 measured from the mean, as aa 0, to prdinate, 0 aa as. 
The results are given for values of 0/<r at intervals of O'Ol 



From Table 2 we may find the qwrtilea of the normal distribu- 
tion with mean at a: « 0. The relative frequency from the mean to 
the upper quartile is 0*25, which is therefore the corresponding area 
under liie curve. Interpolating with the aid of Table 2 we find 




























92] Stm of Normal Variaies 57 

xftr m 0*6745, and the upper quartile is therefore 0*6745(r. Simi- 
larly, the lower quartile is — 0*67460*. 

ExampiU 1. Using Table 2 diow that the 6th, 7th, 8th and 9th deoOes 
are 0*253(r, 0*526(r» 0*842cr and l*282fr respectively. 

Example 2. For a normal distribution, with mean S « 1 a^d 8.D. 3, find 
the pxolmbilities for the intervals 

(i) X = 3*43 to » = 6*19, (ii) a? = - 1*43 to » = 6*19. 

Let x^ denote the deviation from the mean. Then, in the first part of the 
ezcunple, the values of x'l<r for the bounds of the interval are 0*81 and 1*73. 
The areas from the mean to the bounds of the interval are 0*2910 and 0*4682. 
The area corresponding to the interval is the difference of these areas, and 
is therefore 0*1672. 

In the second part the values of x^l<r corresponding to the bounds of the 
interval are *0*81 and 1*73. The area from the lower bound to the mean is 
0*2910, and that from the mean to the upper bound is 0*4682 as before. 
The required area is the stun of these, viz. 0*7492. 

22. Distribution of a sum of independent normal variates 

Consider the normal variate x, with mean a and variance o’*. 
Its m.g.f. with respect to the origin is 

Mm - 

=a exp {at + icrH*), (18) 

in yirtue of (10). The cumulative function is 

K{t) s* log Jf(0 = crf+^V. 

If e is a constant, cx is normally distributed with mean ea and 
variance chr*. Hence the m.g.f. of the distribution of cx is 

exp (aU + 

Let Xt (i m 1, ...,n) be n independent normal variates, with means 
Of and variances o*}. Then, by the theorem of § 15, the m.g.f. of the 
sum J^CfXf is 

®3CP i<*Scf«n* 



58 Standard Distributions [m 

this is the m.g.f. of a normal distribution with mean S and 
variance Hence the theorem: 

If the independent variates (i « 1, ...,n) are norrmlly distribtUed 
tvith means a^ and variances crj, the variate is normally dis- 
tributed with mean variance Scfcrl. 

In particular, the sum (or the difference) Of two independent 
normal variates is normally distributed, with variance equal to 
the sum of their variances. Also, if in the above theorem we put 

(i - 1, 2, ...,n), 

we have the important result: 

If the independent variates x^(i =s 1, ..., n) are normally distributed 
about a common mean^ a, with a common variance^ cr®, their mean is 
also normally distributed about a, but with variance o'^Jn. 

Other proofs of the above theorem will be found in Ex. 8 at the end 
of this chapter. 
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EXAMPLES III 

1. Show by direct calculation that the third moment of the 
binomial distribution about x^Oia 

- «ip[(n-l)(n-2)p*+3(n-l)j>+l], 
and deduoe that /i, = npqiq-p). Similarly, show that 
/t4-np7[l + 3(n-2)p7]. 
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2. Show that, for Poisson’s distribution, the third and fourth 
moments about x ss o ore given by 

=» m[(j»+l)*+m], fil = ro(ni® + 6m* + 7m+ 1), 
and deduce that 

/t, = m, fi^ = 3m®+m. 

Deduce these results also as limiting values of the corresponding 
moments in Ex. 1. 

3. In a normal distribution, whose mean is 2 and s.d. 3, find a 
value of the variate such that the probability of the interval &om 
the mean to that value is 0*4115. {Ana. x = 6*05.) Find another 
value such that the probability for the interval from x = 3*5 to 
that value is 0*2307. {Ana. x = 6*26.) 

4. If two independent variatea, x and y, have Poiaaonian diatrihu- 
tiona with meana and m 2 , ^ir sum ia a Poiaaonian variate with 
mean m-^+m^. (Cf. § 18.) 

The variates take only the values 0, 1, 2, 3 We require the 

probability that their sum will take the value r. The probability 
that simultaneously x will have the value a and y the value r — « is 

mjexp(— m,) m5“*exp(— wij) 

Ti ^ • 

Summing for all values of a from 0 to r, we see that the probability 
that x+y will have the value r is 

___/ ™ „ N rnlml-* (mi + m2)»'exp{-mi-m2) 

exp {-m,- m 2 )^S^ “ JT 

Consequently is a Poissonian variate with mean It 

follows that, if the independent variates have Poissonian distribu- 
tions with means {i =s 1, their sum is a Poissonian variate 
with mean 

5. Consider the normal distribution which has the same mean 
(3*95 lb.) and the same s.d. (0-46 lb.) as the distribution of yields 
of grain in Ex. I, 2. Find the relative frequencies of the normal 
distribution for the intervals corresponding to the various classes, 
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and deduct thd-dass frequencies per 500 of the normal distributioii. 
This process is sometimes spoken of as fitting a normal distribvlion 
to the data. 

Take, for example, the class whose limits are 4*1 and 4*3 lb. 
Since the mean is 3*95 we have, for the lower limit, xlcr » 0-15/0*46 
» 0*3261. The area from the mean to this limit is therefore 0*1278, 
and the area to the right of it is 0*3722. Similarly, for the upper 
limit xjcr s 0*35/0*46 = 0*7609; and the area to the right of this is 
0*2233. The diiSerence of the areas to the right of the limits is that 
corresponding to the interval. Its value is 0* 1489. This is the relative 
frequency for that interval. 

The work can be tabulated as below. Here x denotes the deviation 
of a class limit from the mean 3*95 lb. Knowing that o* ss 0*46 lb. 
we can find x/a for each limit. The third column gives the area under 
the normal curve, to the right of the class limit. The differences, 
recorded in the next column, are the relative frequencies for the 
various classes. These differences, multiplied by 500, give the normal 
frequencies of the classes per 500 of the total. The last column 
records the class frequencies in the given distribution. The lower 
limit of the lowest class is taken as — oo, and the upper limit of the 
highest class as oo, so as to include the whole of the normal distribution. 

Fitting a normal distribution to that of 600 yields of gram 


Class 

limit 

xja 

Area 
to right 

DifTerenoe 

d 

600(i 

Observed 

/ 

— 00 

— 00 

1*0000 

0*0112 

5*6 

4 

2-9 

-2-2826 

0*9888 

00212 

10*6 

16 

3*1 

-1*8479 

0*9676 


23*2 

20 


-1*4130 

0*9212 

0*0861 

42*6 

47 

Bn 

-0*9783 

0*8361 

0*1295 

64*7 

63 

K n 

-0*6435 


0*1633 

81*6 

78 

3*9 

-0*1087 

0*6433 

0*1712 

86*6 

88 

4*1 

0*3261 

0*3722 

0*1489 

74*4 

69 

4*3 

0*7609 

0*2233 


63*7 

69 


M956 


00644 

32*2 

36 

■En 

1*6304 

0*0616 


16*1 

10 

4*9 

2*0662 

0*0194' 


6*6 


6*1 

oo 

2*6000 

00 

0*0062 

0*0000 


8*1 

Hi 


6. As in the previous example, fit a normal distribution to the 
distribution of wages of IfiOO employees given in Ex. 1, 3, diowing 
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that tii9 class firequencies per thousand of the normal distribution 
are approximately 6-7, H-3, 26-0, 48-0, 79*0, H3«l, 140-6, 161-0, 
140-8, 113-6, 79-6, 48-1, 26-3, 11-6 and 6-7. 

7. In 1 ,000 extensive sets of trialsfor an event of small probability, 

the frequencies /i of the numbers of successes proved to be 

x: 0 1 234667 

/: 306 366 210 80 28 9 2 1 

Show that the mean number of successes is 1-2, and hence that the 
frequencies of the Poissonian distribution, with the same mean and 
the same total frequency, are approximately 301-2, 361-4, 216-8, 
86-7, 26-0, 6-2, 1-2, 0-2. . .. Verify that the variance of the given 
distribution is 1-28. 

8. Sum of independent normal variates. Another proof of the 
theorem of § 22 may be given as follows. Let x and y be the in- 
dependent normal variates, with a common mean zero and variances 
<rj and respectively; andletu => x+y. Then, for a fixed value of y, 
du = dx. The probability that the value of x will fall in the interval 
dx is ^{x)dx; and therefore, for a fixed value of y in the interval dy, 
the probability that u will fall in the interval du is 



But the probability of y falling in the interval dy is 

and the compound probability of u falling in the interval du and, 
at the same time, y in the interval dy is the product dp^dp^. In- 
tegrating with respect to y over its range of variation, we have the 
probability that u will fall in the interval du, irrespective of the 
value of y, as 
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In this expressian the argument of the exponential may be put in 
the form 

u* g-f+yj / wrl \« 

"2(£r*+<ri) , 2<r|(r| Y o-f+aS/ ‘ 

If then we change the variable of integration from y tat, where 


we have 


t *= y— 


ua\ 


du r T 

“ (o-f + <ri)] L ~ 2(tr| + (r|) J 


2cr\(T% J 


di 


in virtue of (10). Thus u is normally distributed about zero as 
mean, with variance cf +or|. The reader should have no difficulty 
in removing the restriction of a common zero mean for the 
variates. 

The following is still amiher proof of the theorem. With the 
same notation let 

(To 0*1 

tt = ®+y, » = --^a: + -ly. 

cTj <r. 


Then 


«•+»• X* , y* 


and, as the values of x and y range from — oo to oo, so do those of 
u and V. The probability that x and y fall simultaneously in the 
respective intervals dx and dy is 




2ff<r, 


so that the probability density for the joint distribution of x and y, 
i.e. the probability per unit area at the point (x, y), is 


2,mrx<r^ 


exp 


[-m)] 


2ff(ri0’8®*^ 


r 1 

L 2\orf+<r|/_’ 
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Now the area of the element of the xy plane, bounded by the curves 
along which u has the values u and u+du respectively, and those 
along which t; has the values v and v+dv, is 


where 






d{u, v) 


dudv = 




d(v^) ^ Jacobian of x and y with respect to u and v. 

Consequently the probability that, for a random choice of x and y, 
the representative point {x, y) will fall in this element of area is 


2n(sr\+a\) ^ 1_ 2\a-J+cr|/J 

du i u^\ dv i v*\ 

~ \ \ ~ 2a*) ’ 

where or* = crj + o*!. Since this is the probability that simultaneously 
u will lie in the interval du and v in the interval dv, it follows that 
these variates are independent, and that each is normally distributed 
with zero as mean and (rf + (r| as variance. 

9. Using the formulae of Ex. I, 11 show that, for the binomial 
distribution, the factorial moments about the mean are 


Mi^ = npq, /iii,=‘-2npq{p+l), 

and that, for the Poissonian distribution, 

/i(St = Tn. /tfg)=-2tn, /l^^^ = 3m{m + 2). 


MATHEMATICAL NOTES 
NOTE I 


Derivation of the Poissonian distribution (see §18) 

In the binomial distribution the probability of the value r of 

the variate is j We require the limiting value of this ex- 

pression when p = min and n tends to infinity, m being constant. 
The expression may be written 

nl /wy/, m^( nl 

r!(n— r)!\TO/ \ »/ r\\ n) {n — r)\n^(\—m/nY* 
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limiting value of that part which preoedes the of multi- 
plioation ie I. Then, using Stirling’s formula for nl, vhs. 

nl~^(SS)m)»*e"* 

(the limiting value of the ratio of these two expressions, as n tends 
to infinity, being unity), we have 


lim 


nl 


(n— r) ! 1 — «/n)*‘ 

.ita 


lim 


4 j{ 2 n(n — r)} (n — r)*^ 1 — tn/ny 

1 1 


€*■(1 — rjn)’^-^ (1 — m/ny eTer 
since r is finite. Consequently 


1 , 


liml Ip'}"' 


“TT* 


which is the required probability of the value a; s r in the Poissonian 
distribution- with mean tn. 


NOTE II 


Perlvatlon of the normal distribution (see §19) 

To examine the behaviom of the probability 

p nipwv-htyiia-o 

~ (np+*)! (»g— *)! 

as n tends to infinity, replace the factorials by their values given 
by Stirling’s formula (Note 1). Then an easy algebraical Bimpli> 
fication leads to 

P ' 


N^ilrnipqY 


N 




np+«44 


(-5) 




where 
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Taking logarithms of both sides we may write, provided ] ^ | is less 
than the smaller of the two quantities np and 

_»/! 1\ «« aWl 1\ a:»/l 1\ 

2n\p q) 2npq 4n*\2)* g*/ 6n*\g* p*/ 

+tenus with higher powers of Ijn. 

Introducing a new variable, z = xj^n, we may write the above 


logI\r 




This series is convergent so long as | z | is less than the smaller 
of the quantities p^Jn and g^n. Now let n tend to infinity. Then, 
provided that neither p nor g is very small, and that | z | is either 
finite or of lower order than !^n, all the terms of the above series tend 
to zero except z^l2pqi so that, when n tends to infinity, we have 


logJV = 


2pg 


or . N = exp (z*/2pg). 

Corresponding to unit increment in a; we have the increment 1/^ 
in z, which may be denoted by dz when n tends to infinity. And, if 
we write dP for the limiting value of P, the above formula for P 
becomes 

giving the probability of z falling in the interval dz; and we have 
the required continuous distribution of z. 


NOTE III 

An Important Integral 
To evaluate the integral 

/ - JJexp(-a:*)ia5, 
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let us write 


Standard Distributions 


[m 


Then 


/, * exp ( - a:*) daf =s exp ( - y*) 

75 = f f exp(-*®-y*)da;dy. 

Jo Jo 


If we regard x, y as Cartesian coordinates in a plane, the integration 
is extended over the area of the square OACB, bounded by a; = 0, 
X = a, y=sO, y = a (see Fig. 4). In polar coordinates the above 
relation is 


I\ - JJ®*P ( “ *■*) rdrdjd. 


the integration being extended over the same square Since the 
integrand is positive, the integral is intermediate in value between 
the integrals over the quadrants of circles, with centre 0 and radii 
a and a ^2 respectively. Consequently if lies in value between 

pin pa pin paVi 

dd\ rexp( — r*)dr, and dd\ fexp(-r*)dr. 

Jo Jo Jo Jo 


But, as a tends to indnity, each of these integrals converges to {n. 
Hence 


P = lim /f = \n, 
and therefore 

/•oo 

exp ( -a:®) da: = 

Jo 

Thesubstitutiona; = «/(r^2then 
gives 

and, since the integrand is an even 
function of z. 




CHAPTER IV 


BIVARIATE DISTRIBUTIONS. 
REGRESSION AND CORRELATION 

23. Discrete distributions. Moments 

Suppose that we have records for N marriages, giving the ages of 
bridegroom and bride, x and y years respectively. Then to each 
marriage corresponds a pair of values (x^, of the variables. Each 
pair may be represented by a point in the x, y plane, the co- 
ordinates of the point being the pair of values represented. Such a 
graphical representation is called a acoMer diagram. Possibly the 
pairs of values are not all different. If the pair {Xf, y^ occurs/^ times, 
then/f is the frequency of that pair; and, as in the case of a single 
variable, 

ih=N, ( 1 ) 

n being the number of different pairs of values. The assemblage of 
pairs of values, together with their frequencies, constitutes a 
bivariate frequency distribution. Other examples of bivariate dis- 
tributions are furnished by the heights and the weights of a group 
of men, or by the amount of fertilizer per acre and the yield of grain 
per acre on a number of different plots of land. 

The moments of a bivariate distribution are generalizations of 
those of a univariate one. Thus the moment about the origin 
(0, 0), of order r in x and a in y, is defined by 

( 2 ) 

In particular ^ j 

Pio “ ^ ~ Pol “ ^ JjfiVi ~ y (2) 

where x is the mean value of x in the distribution, and y the mean 
value of y. Similarly, 

pio = ^ Poi “ W 


5-3 
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where (r* is the vaiianoe of the variable x in the distribution, or the 
moment /tu about the mean {x,y); and likewise is the variance 
of y, or the moment jtf„ about the mean. Lastly, we may mention 
the moment which is given by 

The corresponding moment about the mean of the distribution 

is called the covariance* of the variables. Thus 

/*u = (y<-5) 

= - * hfiVtlN - y ^ 

^jiit-xy (6) 

in virtue of (1) and (3). This formula is very important, and will be 
used frequently. 

24. Continuous distributions 

A continubus bivariate distribution includes all pairs of values 
represented by points within a certain region of the xy plane, each 
pair of values occurring at least once. The number of pairs of values 
is therefore infinite. Distributions of the type ordinarily employed 
have each for relative frequency density a continuous function/(x, y), 
which is such that the relative frequency for the infinitesimal 
rectangular region, of area dxdy, and defined by 

x-^<x^x+idx, y-^yty^y+idy, 

has the value /(x, y) dxdy. Thu8/(x, y) is the relative frequency per 
unit area at the point (x, y), and the sum of the relative frequencies 
for all elements of the distributipn is unity, so that 

1 . ( 6 ) 

* Many writara denote this quantity by p. We are* however, loth to over- 
work the symbol p by giving it still another meaning. 
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tho integration extending over the region* of the plane which repre- 
sents the distribution. The mean (z, y) of the distribution is given by 

* “ P •= (7) 

For higher moments we have formulae corresponding to those of 
a discrete distribution. Thus 


Iho 


jjix-x)*f(x,y)dxdy 


which is equivalent to 

and similarly, ot = /»oa “ P*- 




( 8 ) 

( 9 ) 


The covariance, being the first product moment about the mean, is 


fhi=‘ jj (»-*) (y-y)f(^>y)dxdy 

= /til “ ® JJ p/(** y)<^^y-yjj */(*. y) ^^y + ^ 


= /*ii-*P. 


( 10 ) 


in virtue of (6) and (7). 

It will be sufficient, as a rule, to give proofs of theorems for a 
discrete distribution. The student can easily rewrite these for the 
case of a continuous distribution, replacing sums by definite 
integrals, and the relative frequency hy f{x,y)dxdy. Some of 
the steps will be indicated in Examples. 


25. Lines of regression 

It frequently happens that the scatter diagram indicates an 
association between the variables, x and y, the distribution of dots 
being denser in the neighbourhood of a certain curve, which may 
be called a curve of regression. The equation of such a curve indicates 
a functional relationship to which the association of the variables 

* By defining/(a;, y) as equal to zero outside this region, we may make the 
range ^ integration — oo to oo for each variable. 
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approximates, more or less roughly. We shall consider later the 
problem of fitting different curves to the data, so as to obtain the 
curve of a specified form which best fits the data. For the present 
we confine our attention to the straight line which is the best fit, or 
more definitely, to the problem of determining two straight lines, 
one of which gives the closest estimate a straight line can give to 
the average value of y for each specified value of x, while the other 
gives the corresponding estimate of x for a given value of y. 
These are called the lines of regression of y on x, and of a; on y 
respectively. 

Consider first the line of regression of y on x. We hsve to deter- 
mine constants a and b so that the equation 

y = o+6aj (11) 

gives, for each value of x, the best estimate a linear equation can 
give for the average value of y. We interpret the term ‘ best estimate ’ 
in accordance with the principle of least squares; that is to say, we 
find a and 6 so as to minimize the sum of the squares of the deviations 
of the actual values of y in the distribution from their estimates 
given by (1 1 ). Thus if is the point of the diagram representing the 
pair of values {x^,y^), and is the point on the straight line ( 11 ) 



with the same abscissa a;^, the deviation of JF^ from its estimate is 
whose value is clearly 1 / 4 — (o-t-bajj). Thus we have to choose 
a and 6 so as to make S/<(y 4 — a— 6 »<)* a minimum, /< being the 

i 

frequency of the pair of values For a minimum value the 
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partial derivatives of the above expression with respect to o and b 
must both be zero. Hence the equations for determining these 
constants are 


= 0 , "StfiXM-a-bTf) * 0 , ( 12 ) 

which are called the normal ajiiuitions. In virtue of (3), (4) and (5) 
they are equivalent to 

y—a—bx = 0 (13) 

and /iii+xy—ax—b(ag + ^) = 0. (14) 


The first of these sliows that the required line passes through the 
mean {x,y) of the distribution. Also, on eliminating a between 
(13) and (14), we find 

/«u-6cr| = 0. 


Consequently the gradient 5 of the line of regression of y on a: is 


b = 


(IB) 


and, since the line passes through (x, its equa tion may bo expressed 


= (*-«). 


(IG) 


If the mean of the distribution is taken as origin, the line of regres- 
sion of y on a; is 

y = (17) 


The gradient /in/crl is often called the coefficient of regression of 
y on X. 

Similarly, or by interchanging variables, we find that the line of 
regression of a; on y is 

*-*=S(y-y). ( 18 ) 

and HiJo% is the coefficient of regression of x on y. The product of 
the two coefficients of regression is symmetrical with respect to x 
and y. Its square root, [ii-Jcrg.O'y, is the coefficient of correlation, r. 
Thus 

/hi 

having the same sign as the covariance /tn. 


( 19 ) 
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BxtmpU 1. Fihd tiie tangent of the inclination, fi, of (IS) to (16). 
Since the gradients of tiie two lines are and we have 




1-f* <r,er, 
r oi+o^' 

Example 2. For a continuous bivariate distribution the mean square 
deviation from the line of regression of y on ai is 


//' 


(y-a~ bx)*f{x, y) dxdy. 

By minimizing this for variation of a cuid h, show that the normal equations ore 
j j (y~a-bx)f{x,y)dxdy = 0 = JJ x{y-a-bx)f{x,y)dxdy, 
and that these are expressible, in the forms (13) and (14). 


26. Coef&dent of correlation. Standard error of estimate 
The significance of the correlation coej£cient, r, as a measure of 
the closeness of the association of the variables x and y, will be 
apparent from the theorem now to be considered. The equation of 
the line of regression of yon x was found by minimizing the sum of 
the squares of the deviations HfPf. We shall now prove that the 
sum of the squares of these deviations from the line of regression of 
y on a; is equal to N<}*{1 — r*). 

Let the mean of the distribution be taken as origin, so that 
5 « I? = 0. Then, in virtue of (17), is equal to — 6®*; and the 

required sum of squares of the deviations is 

* ~Nal-2Nb/t^^+Nb^(f* 



^Nolil-1*), ( 20 ) 

as stated above. Denoting this sum of squares by NS^ we have 

-S*=(7t(l-r«) (21) 
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Since is the mean square devialaon of points from tiie line of 
regression of if on x, 8y is called the standard error of esUmaie of y 
from the regression equation (16). In the same way the sum of the 
squares of the deviations of points from the line of regression of x 
on y, measured parallel to the x-azis, is N8^, where 

8»=ai{l-r*), (23) 

and Sg. is the standard error of estimate of x from the regression 
equation (18). 

Since the sum of squares of deviation cannot be negative, it follows 
from (20) that r* < 1, or 

' ' ’ -l<r<l. (24) 

If r s 1 or — 1, the sum of squares of deviations from either line of 
regression is zero. Consequently each deviation is zero, and all the 
points lie on both lines of regression. These two lines then coincide, 
and there is a linear functional relation between the variables x 
and y, giving perfect correlation. The nearer r* is to imity, the closer 
are the points to the lines of regression, and the nearer are these two 
lines to coincidence (cf. § 26, Ex. 1). Thus the magnitude of r may be 
taken as a measure of the degree to which (he association between the 
variables approaches a linear functional reJationship. The sign of r 
is the same as that of the covariance /in, and therefore also the same 
as tlmt of the gradients of the lines of regression. Hence r is positive 
when, on the whole, y increases with x, and negative when y decreases 
as X increases. When r is zero the variables are usually described as 
wncorrdaXed. 

The coefficient of correlation between two variables is also the 
coefficient of correlation between the deviations of the variables 
from their means. For the value of r depends only on /int and Oy, 
and these are functions of the above deviations. Thus 

-»)(y<-y) 

' V[2/<(*<-2)*]V[2/<(y<-y)*3’ 

and, in virtue of (3), (4) and (6), this may be expressed 

( S/ <yf)/-y , 26 ^ 

This formula is sometimes convenient. 
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Example. Takihg the mean as origin for a continuous distribution, show 
that the mean square deviation, ^ &om the line of regression of y on « 
is given by 



{y — bx)*-f(x, y) dxdy 


= o 5 - 26 /»i,+ 6 *oi = o;(l-r»). 


27. Estimates from the regression equation 
A few simple relations between the yt and their estimates, Y^, 
given by the regression equation (11) or (16), play an important 
part in later work. Thus 

Ti = a+bXf, (26) 

and the normal equations (12) are expressible as 

(27) 

and = 0. (28) 

From (27) it follows that the mean of the F’s is equal to the mean 
y of the y’s. Also, on multiplying (27) and (28) by o and b respectively 
and adding, we deduce 

= 0, (29) 

and therefore, in virtue of (27), 

TiMVi - '^i) - 1?) = 0. (oe) 

This relation is very important. 

The sum of the squares of the deviations of the y's from their 
mean may then be expressed 

ZfiiVi-y? = S/<[(y*-ri)+(ri-j?)r 

(31) 

since the sum of the products vanishes by (30). Now the first 
member of (31) has the value Na\. The first sum in the second 
member has been proved equal tb Ncr\[\ — r*). Consequently 

'Lfi{Yi-y)^ = Nr^o%, (32) 

showing that the variance of the F’s is r* times that of the y’s; or 

Or - t^ol, o’y = I f I tTy. (33) 
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Finally, we may show that the coefficient of correlation between 
the j/f and their estimates Yf is equal to | r | . Take the origin at the 
common mean of these variables, so that y <=0. Then, in virtue 
of (29), 

and the coefficient of correlation between the y’s and the F’s is 

(34) 

O-yCTy (TyCTy (Ty ' ' 

Example 1. Prove (34) os follows. Taking the mean of the distribution as 
origin, so that F* = 6aj<, o-y = 1 6 1 (Ta, we have for the required correlation 
coefficient 

_ 6r _ 

Na-yO'Y I M o’.o'y 1 6 1 




since r and 6 have the same sign. 

Example 2. Show that the normal equations for a continuous distribution 
(cf. § 25, Ex. 2) are expressible as 

J J (y - T)f{x, y)dxdy = 0 , J J x(y - Y)f(x, y)dxdy=. 0, 

and from these deduce the relations 

JJ Y{y- Y)f(x,y)dxdy = 0, JJ (y- r)(r-y)/(®,y)(fcrfy = 0 
corresponding to (29) and (30). Hence show that 




Y)*f(x,y)dxdy+ 


//(T- 


y)*J(!>!,y)^dy, 


and deduce that JJ {Y-y)*f(z,y)dxdy = t*a^. 


28. Change of units 

Before illustrating the above theory by a numerical example, let 
us examine the effect of a change of units on the calculation of r, 
/til 6. ^ in § 2, let u be the measure of the deviation of x from an 

origin x = a,va terms of a unit c times the original x-unit, so that 
X Si a + cu. Then, if x' and u' are the deviations of the two variables 
from their means, x' = cu', and 
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Siiiiilarly, if t> is tibe measure of the deviation of y from an origin 
y B a', in terms of a unit c' times the original y-unit, y « a'+e'v, 
y' ■> c'v' and <r^ a c'<r^ Then the coefficient of correlation between 
X and y is 

_ ce''ZfiUiVyN covariance of it and v 
a-giCTy ~ cc'<r„<r, r ’ 

and is thus equal to the correlation between u and v. The value of r 
is therefore independent of the units employed, and is thus an 
absolute measure of correlation. But the covariance of x and y is 

All “ = ec''ZftUti/tlN 

*= cc' (covariance of u and r). (35) 

Similarly, the regression coefiScient of on a; is 

(regression coefficient of v on u). (36) 

If then c and are equal, the value 6 is the same for both pairs of 
variables. 

29. Numerical Illustration 

For 1,000 marriages the ages of bridegroom and bride, x and y years 
respectively, are grouped in the table below with class interval of 5 years for 
each, the frequencies for the different classes being shown in the body of the 
table. Find the regrearion eqiiations and the coefficient of correlation l^tween 
the variables. 

Such a table is called a correlation table. The values of x and y indicated 
are the mid-values in the classes. Thus for the class in which the age of the 
bridegroom is between 26 and 30, cuid the age of the bride between 20 and 25, 
the values of x and y are taken as 27*5 and 22*5 respectively, and the frequency 
is 190. The data represented in any one column constitute a vertical array ^ 
or an array of y*s, because in each such array y assumes different values 
while X remains constant. Similarly, the data in any row constitute, a hori- 
zontal orrayt or an array of a;’s. In the row prefixed the ^equencies 
are given for the individual coluxpns, and in the column headed the 
frequencies for the separate rows. 

Let us take as new origin the point (27*5, 27*5), whose coordinates are 
the mid-values of the class 25 to 30 for x and y; and, as a new unit 
for each variable, the common class interval of 5 years. Then the deviations, 
u and 0 , of the ages of bridegroom and bride from the new origin in terms of 
the new unit are those shown in the second row and the second column. The 
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fourth row from the bottom of the table» prefixed gives the sum S/u 
for each column; and these are added horizontally to give the total sum 124 
for the distribution. The next row gives the sum for each column ,with 
total sum 1,834 for the distribution. The row prefixed V gives for each 
column the sum 'Lfo. Thus the sum — 28, in the column w a — 2, is obtained 
from 

ll(-2) + 6(-l) = -28. 

The last row gives the sum ^fuv for each column, each entry being obtained 
by multiplying the value of V above by the common value of u for that 
column. Summing horizontally all the values uF, we have the product sum 
'Lfuv for the whole distribution, viz. 1,109. 

The columns to the right of the table are explained similarly. That headed 
U gives for each row the siun 2/u. Thus the first entry — 79 in the colunm is 
obtained from 

ll(-2) + 62(-l) + 19x0 + 3x 1 + 1x2 = -79. 

The last colunm, headed vU, gives the sum S/mv for each row, any entry 
being obtained by multiplying the value of U to the left by the common value 
of V for that row. Summing the entries in this column we find again the 
product sum S/uv for the whole distribution, thus providing a check on the 
calculations. 

Using the values thus obtained we have 

u = 124/1000 = 0*124, fi = - 371/1000 = - 0*371. 

Therefore 

X = 27*6 + 6(0*124) = 28*120, y = 27*6-6(0*371) = 26*646, 
giving the mean ages of bridegroom and bride. The variance of u is 
al = 1*834 -(0*124)* = 1*8186, 
so that <r„ = 1*349 = 1*36 nearly, 

and therefore o’. = 6*746 = 6*76 nearly. 

Similarly, we find al = 1*631 - (0*371)» = 1*3934, 

O’. = 1*18; O'. = 6*90. 

The coefficient of correlation between x and y is equal to that between u 
and V, Thus 

_ SxivlN’~uv _ 1*109 + 0*371x0*124 
“ o-.or. T 1*349x1*180 

= 0*726 = 0*73 nearly. 

Since the tmits for u and v are equal, the regression coofEciont of y on a; is 
equal to that of v on u, which is 

(covariance of u and v)/oS == 1*166/1*8186 = 0*636. 
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The line of regression of y on x is therefore 

y- 26*646 = 0*636(x- 28*12), 
that is y= 7*79 + 0*636iB. 

The student may find similarly that the line of regression of a; on y is 

X = 6*86*t-0*820y. 


30. Correlatloii of ranks 

A group of n individuals may be arranged in order of merit or 
proficiency in the possession of a certain characteristic. The same 
group would, as a rule, give different orders for different character- 
istics. Considering the orders corresponding to two characteristics, 
A and B, let Xf, be the ranks of the tth individual in A and B 
respectively. Then the coefficient of correlation between the x’a and 
the y’s is called the rank correlation coefficient in the characteristics 
A and B for that group of individuals. On the assumption that no 
two individuals are bracketed equal in either classification, each of 
the variables takes the values 1, 2, 3, ..., n; and therefore 


x= l{n+l) = y. 


(37) 


As a rule Xf is not equal to y^. Let df denote the difference, so that 

<^< = ®i-yf* (38) 

Then, if x' and y' denote the deviations of the variables firom their 
means, we have also 

di = x'i-y’i. (39) 


The coefficient of correlation between the variables is given by 


V[(s*i*)(sy;®)]* 


(40) 


To express this in terms of n and the differences, d^, we observe that 
the variance of each of the variables x and y is (n^— 1)/12 (cf. §3, 
Ex. 2). Therefore 

2:®;® = sy;* = (»*-«)/i2. (4i) 

Also Sdf = 

and thus, in virtue of (41), 
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Substitution of these Talues in (40) gives 




fiV 

(42) 


This is the required formula for the coefficient of correlation of 
ranks. 

If the correlation is perfect all the d’s are Kero, and r as i. If the 
orders in the two characteristics are exactly the reverse of each 
other, x^+y^ = n+l. All the points of the scatter diagram then lie 
on a straight line with negative gradient. Consequently r = - 1, 
and there is perfect inverse correlation. 


Example. The ranks of the same 16 students in Mathematics and Physics 
were as follows, two numbers within brackets denoting the ranks of the same 
student in Mathematics and Physics respectively; (1, 1), (2, 10), (3, 3), (4, 4), 
(6,6), (6.7). (7,2), (8,6), (9,8), (10,11), (11,16), (12,9), (13,14), (14,12), 
(16, 16), (16, 13). Calculate the rank correlation coefficient for proficiencies 
of this group in Mathematics and Physics. 

Here d, is the difference between the two numbers in the tth pair of brackets. 
It is eeudly verified that =s 136, n* — n = 16 x 266. Consequently 


6x 136 
10 X 265 


l-ix=0-8. 


31. Bivariate probability distributions 
The theorems proved for a bivariate frequency distribution in 
§§ 23-28 hold equally for a probability distribution of two variates, 
relative frequency in the former case being replaced by probability 
in the latter. Thus, for a discrete distribution, if is the prob- 
ability of the occurrence of the pair of values (Xj,^^), the moment 
/i'„ about the origin is 

i 

which is the expected value of the product x'y*. In particular the 
expected values of x and y are 

while, corresponding to (4) and (6), we have 

(T* = -* O’! - /‘os “ t^(y)]* 


and 
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/til bemg the oorariance of the variates, which is the expected vsloe 
of the product of their deviations from their means. The coefficient 
of correlation is defined by (19), which may here be expressed in the 
alternative form 


E{xy) _ E{xy) 


(19') 


x’, y' being the deviations of the variates from their expected values. 

In the case of continuous probability distributions we confine 
our attention to those in which the probability that the variates 
will fall simultaneously in the intervals dx and dy is expressible in 
the form <^{x,y)dxdy, the probability density <j>{x,y) being con- 
tinuous and essentially positive. Then we have formulae corre- 
sponding to (6)-(10), with ^{x,y) in place of f(x,y). 

The variates are iridependent if the probability distribution of 
each is independent of the value assumed by the other. In particular, 
if the variates are continuous, with probability densities ^i(x) and 
^^(y) respectively, the probability that x will fall in the interval dx, 
and at the same time y will fall in the interval dy, is ^i{x) dx(l>^{y)dy 
by the theorem of compound probability. Then the probability 
density ^(x, y) for the bivariate distribution is of the form <j>i{x) 
Conversely, when this relation holds, the continuous variates are 
independent. Now it was proved in §§ 10 and 12 that the covariance 
of two independent variates is equal to zero. It follows from the 
above definition of r that the coefficient of correlation of tvx> independent 
variates is equal to zero. The converse of this theorem, however, is 
not necessarily true; that is to say, imcorrelated variables are not 
necessarily independent. 

The moment generating function of the bivariate distribution is 
defined as the expected value of the function exp {^tiX-^-t^y), where 

and are independent of x and y. Thus, in the case of a con- 
tinuous distribution, the m.g.f. with respect to the origin is 


M(ti,ti ) » 



exp {tiX + t^y) 4>{x, y) dxdy. 


When this integral has a meaning the exponential may be expanded 
in powers of ti and t^, and the coefficient of ! s ! is the moment 

^ about the origin. In Ex. 3 at the end of Chapter v we shall find 
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this function for the bivariate normal distribution, and use it to 
calculate the moments. 

When the variates x and y are independent we have 

M(ti, «*) = ^?[exp (t^x + «jy)] = ^/[exp (<!»)] . f?[exp 
= (function of tj) x (function of ig). 

And conversely, when the m.g.f. is of this form, the variates are 
independent. 


32. Variance of a sum of variates 

Consider the variates x and y, with standard deviations or^ and 
cTy respectively. Their distributions determine a bivariate prob- 
ability distribution for the two variates, with correlation coefficient 
r given by (19'); and each value that may be taken by their sum 


u = x+y (43) 

has a definite probability. Then, in virtue of § 10 (10), 


£(«) = E{x) + E{y), 


(44) 


and therefore, if x\ y', u' are the deviations of the variates from 
their means, we obtain from (43) and (44) by subtraction 


u' = x' + y'. 

Consequently = x'^ + y'^ + 2x'y', 


and, on taking the expected value of each member we deduce, in 
virtue of (19'), 

ai = a%+(Tl + 2r(r^<ry. (45) 


This is the required formula for the variance of the sum of the 
variates. And, since — r is the correlation between x and - y, it 
follows that the variance of the difference 


v = x-y 

is given by <r® = (r® -f- oj - 2ra^<Ty. (46) 

If r is zero, as in the case of independence of x and y, we have the 
simple result 


(T* =(r® = (7|-f-0^, 

already proved in §§ 10, 1 5 and 16. 


( 47 ) 
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More generally, let be a linear function of the variates x^y^z, 

so that , 

u = aa; + 6y4-C2 + ..., (48) 

the constants a^b^c, ... being either positive or negative. Then 

E{u) = aE(x) + bE(y) + cE[z) + . . . 

and therefore by subtraction 

u' = aa:' + 6y' + c2' + 

the primes indicating that the variates are measured from their 
means. On squaring both sides, and taking expected values, we 
obtain the required formula 

O'* = aV* + 6*0-* + cV* + . . . + 2 abr^y(r^(ry + . . . , ( 49 ) 

in which is the coefficient of correlation between x and y. 
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EXAMPLES IV 

1 . Show that, if a and 6 are constants and r is the correlation 
between x and y, then the correlation between ax and by is equal to 
r if the signs of a and b are alike, and to — r if they are different. 

Also show that, if the constants a, b, c are positive, the correlation 
between (ax + by) and cy is equal to 

(onr* + h(Ty)j^{fl*c^ + b*a% + 2o6rtrj<ry). 
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2. The variables x and y are oonnected by the equation 
ax+by+CtaO. Show that the correlation between them is —1 if 
the signs of a and b are alike, and + 1 if they are different. 

3. Show that, if x\ y' are the deviations of the variables from 

their means, ^ 

*■ = i - 2^ S - y’Jo-y)* 

and »- = -l + ^S(»</<r,+y;/<ry)2, 

and deduce that — 1 < r < 1. (Rietii, 1027, 2, p. 84.) 

4 . The variates x and y have zero means, the same variance <r* 
and zero correlation. Show that 

(xcosa+ysina) and (xsina— ycosa) 

have the same variance <r* and zero correlation. 

5. Weighted mean with minimum variance. Let x^ (» = 1, ...,n) 
be n independent variates with variances (r\. If the variates are 
given weights w^, their weighted mean is 

{ i i 

where = 

i 

We can show that the variance of this weighted mean is least when 
the weights are inversely proportional to variances In virtue 
of (49) the variance of the weighted mean is 

«r» = Sc|o-| = (Swf<r?)/{S «»,)•. 

i i i 

For a minimum o’* the partial derivatives of this with respect to the 
Wf must be zero. This requires 

(Sti><)*io,o’J-(S«;Jo’|)(Sw<) = 0, 

so that WfCrf « 

showing that is the same for all values of i. Thus the weights 
of the variates are inversely proportional to their variances. 

Show that ihis miniimim variance is equal to B/n, where H is 
the harmooio mean of the variances tr}. 
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6. For a given bivariate distribution find the straight line for 
which the sum of the squares of the normal deviations is a minimum . 

Let the straight line be xcoBx+yemoc ^ p. Then we have to 
minimize the sum of squares cc-hj/i sin a — by equating 

to zero its partial derivatives with respect to p and a. Show, from 
the first of the equations thus obtained, that the required line passes 
through the mean of the distribution. Then, taking the mean as 
origin, show from the second equation that a is given by 


tan 2a = 

cr|-(r 


2 * 

y 


Of the two directions at right angles, found from this equation, one 
makes the sum of squares a minimum, and the other a maximum, 
for lines through the mean of the distribution. (L. J. Beed, Metron, 
voL I, 1921, part 3, pp. 64-61.) 


7. The ranks of the same 16 students in Mathematics and Latin 
were as follows, the two numbers within brackets denoting the ranks 
of the same student: (1,10), (2,7), (3,2), (4,6), (6,4), (6,8), (7,3), 
(8,1), (9,11), (10,16), (11,9), (12,6), (13,14), (14,12), (16,13). 
Show that the rank correlation coefficient is 0*6L 


8. The marks, x and y, gained by 1,000 students for theory and 
laboratory work respectively, are grouped with common class 



interval of 6 marks for each variable, the frequencies for the various 
classes being shown in the correlation table above. The values of 
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X and y indicated are the mid-valuea of the classes. Show that the 
coefficient of correlation is 0-68, and the regression equation of 
y on X 

y = 29-7 + 0-666a;. 

9* The logarithm of the m.g.f., < 2 )’ cumulative 

function and the cumulant is the coefficient of 

in the expansion of K{ti, in powers of and ^ 2 - Show that 

^11 “ P'lly ^21 “ /^21> ^81 = /^81 — 3/^2 o /^11> ^22 “ /^22“/^20/^02” 2/^11 • 
10. By means of the identity 

and the fact that u--v is constant along each of one set of the 
diagonal lines of a correlation table, the sum of products may 
be calculated without difficulty. Tabulate the quantities u^v, 
{u — v)*, / and f(u — v)^ for each diagonal line of the correlation 
table on p. 77, and deduce that S/(w — v)®= 1147. Using the 
values found for S S verify that S = 1109. 

Deduce the same result from the identity 

using the other set of diagonal lines. 



CHAPTER V 


FURTHER CORRELATION THEORY. 
CURVED REGRESSION LINES 

33. Arrays. Linear regression 

In the numerical illustration of § 29 all the values of x in any one 
vertical array were regarded as equal to the mid- value of a: for that 
interval. Since the actual values of x vary over a range of 5 years, 
the results obtained cannot be regarded as more than a good approxi - 
mation. A better approximation could be obtained by choosing a 
smaller class interval; but that would make the numerical work 
correspondingly heavier. For accurate work the class interval must 
be so small that all the values of a: in a vertical array are either 
exactly or very nearly equal; and similarly aU the y’s in a horizontal 
array. The theoretical proof of certain formulae, however, is not 
made any more difficult by a choice of arrays sufficiently numerous 
to satisfy the above conditions; for our sums are just as easy to 
manage whether the number of arrays is large or small. And, if 
we are dealing with a continuous distribution, we may take in- 
finitesimal class intervals, dx and dy, and replace our sums by 
definite integrals. We assume then that, in the ith vertical array, all 
the x'b have the same value, 

It will be convenient to extend our subscript notation as follows. 
Though the x’s in the ith vertical array all have the same value, the 
y’s are different. A typical pair of values in the array is 
represented by the point and we shall denote its frequency hyf^p 

Thus the first subscript, f, indicates the vertical array, while the 
second, j, indicates the position in that array. Denoting the total 
frequency in the ith array by we have 

ni = ( 1 ) 

i 

S indicating summation within that array. The mean value of y 
in this array will be denoted by y^ Thus 

= 'LfaVn- 


( 2 ) 
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The mean of the am>y is represented by the point 0^, whose co- 
ordinates are {x^, y^. Hi is the point with abscissa Xi on the line of 
regression of y on x. The mean, O, of the distribution also lies on 
this line of regression, and has coordinates (S, y) given by 

= = ( 3 ) 

i i i i j i 



The summation over the whole distribution is thus divided into 

two summations; for ^ denotes summation for terms in the same 
i 

vertical array, and then £ indicates summation of the result over 

i 

all the arrays. 

It is worth noting that the same equation is obtained for the line 
of regression of y on x, if each value of y in any array is replaced by 
the mean value of ^ in that array. For the equation of this regression 
luM depends only on x, p, a% and And, «nce the above change 
does not affect the x's, x and o% are unaltered by it. So also is y, 
in virtue of the second equation {Z). As for we see that 

N/tu = 2 - Nxy = 2 

i i i 

by (3), and is therefore unaltered by the above change of y’s. 
Consequently, the regression line of y on x is imaltered. It foUows 
that, if the means of the vertical arrays are collinear, their line must 
coincide with the line of regression of y on x. This regression is then 
said to be linear. Similarly, the regression of x on y is said to be 
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linear if the means of the horizontal arrays all lie on the line of 
regression of x on y. It is possible for either regression to be linear, 
or both. 

It was shown in § 26 that iSi| » 0 ^( 1 —r*) gives the mean square 
deviation of points from the line of regression of y on z. If this 
regression is linear the means of the vertical arrays lie on this line 
of regression; and, if further the vertical arrays all have the same 
variance, this common variance must be S ^. The s.d. of each array 
is then <ry^{l—r^). When the vertical arrays all have the same 
variance, the regression of on a; is said to be Jiomoacedastic. 

34. Correlation ratios 

Consider next the deviation OfPi} of the point Pii from the mean 
of the vertical array in which it lies. The Sum of the squares of 
these deviations for the whole distribution is denoted by NS'^, 
so that 

= ( 4 ) 

Then, by analogy with § 26 (22), the correlation ratio,* ijy, of y on a; 
is defined by 

fy=l-S'y*lal, (6) 

ijy being regarded as positive. Thus 

= ( 6 ) 

which corresponds to §26(21). From (6) it is clear that ^’<1. 

Also it is easy to show that r/l ^ This is evident from a comparison 
of (5) with § 26 (22), since S'^ < S^, the sum of the squares of the 
deviations in any array being least when they are measured from 
the mean of the array. Thus 

(7) 

When the regression of y on a; is linear, the straight line of means of 
arrays coincides with the line of regression, and ^ is then equal 
to r*. A non-zero value of i; J — r® is thus associated with a dei>arture 
of the r^ression from linearity. 


* Some writers denote this 'ratio* by 
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It is clear from (6) that the more nearly approaches unity, the 
smaller is S'*, and therefore the closer are the points to the curve of 
means of the vertical arrays. If = 1, then 8'^ = 0, so that all the 
deviations cure zero, and all the points lie on the curve of means. 
There is then a functional relation between x and y. We may there- 
fore describe the correlation ratio as a mmsfire of the degree to 
which the association between the variables approaches a functional 
relationship expressible in the form y = F{x), where F(x) is a single- 
valued function of x. Tliis should be compared with the corre- 
sponding interpretation of r in § 26. 

A convenient expression for can be found, involving the 
s.D . of the means of the vertical arrays, each mean being weighted 
with the frequency of that array. This s.D. is given by 

= S (Fi - F)*- (8) 

i 

Now 

Nor* = S 'LfijiVis-y? 
i J 

= S ^fijiyn - Fi)® + S nM - y)*, 

i j i 

the sum of jiroducts being zero, since TtfijiVij “ Vi) vanishes for each 

i 

array. The first sum on the right is NS'f, in virtue of (4), and 
the second is N(T%,y. Consequently the equation is equivalent to 

<rl^8'*+a%,y. (9) 

Thus the variance of the y’s of the distribution is expressible as the 
sum of two parts, of which the first is the variance within the 
arrays, and the second the variance of the weighted means of the 
arrays. Also, comparing (9) with (6), wc see that 

Vl = o’mvK. 

and therefore rjy = cr^jOy. (10) 

The correlation ratio rjy is therefore the ratio of the s.D. of the 
weighted means of the arrays of y’s to the s.D. of all the y’s of the 
distribution. 
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In the same way, by considering horizontal arrays in each of 
which the value of y is constant, we may define the correlation ratio 
of z on y. For, if 0^ is the mean of the Jth horizontal array, with 
abscissa Zp and NS',^ denotes the sum of the squares of the devia- 
tions of the points, each from the mean of the horizontal array in 
which it lies, is given by 

( 11 ) 

and, as before, we have the relations 

(12) 

and Vx = o'mxlo’x> (13) 

where is the s.D. of the means of the horizontal arrays, each 
mean being weighted with the frequency of that array. 


35. Calculation of correlation ratios 
Formula (10), giving the value of r/l, is clearly equivalent to 

^5 = S nM - y)W<4- (14) 


Now the sum in the numerator may be evaluated without cal- 
culating the deviations — y. For 


But 


i i i 

niVi = 'LfaVij = Tp 


where is the sum of the y’s in the ith vertical array. Hence 

S »<y,- = 'LTi = Ny=T, (I.*)) 


where T is the sum of the y’s for the whole distribution. We may 
therefore write 


JI2 rp% m2 


r N* 


and (14) becomes 




( 16 ) 
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Since the deviation fit>ni the mean is independent of the choice of 
origin, while numerator and denominator of (14) are altered in the 
same ratio by change of unit, the value of calculated from (16) 
is unaffected by change of origin and unit. 

Example, Calculate the correlation ratios for the distribution of ages of 
bridegroom and bride given in § 29. 

Working in terms of v and the 5-year unit we have 


T = - 371, N = 1,000, T^jN = 137-64. 


The values of are given in the row prefixed V, Hence 



(»-28)* (-339)* (26)* 

17 332 7 


876-81, 


Nai = 1393-4. 


Consequently 


. 876-81-137-64 739-17 ^ 

“ 1393-4 ~ 1393-4 “ ^ ’ 


and = 0-729 nearly. 

This is only slightly greater than the value 0-726 found for r. 

The reader should verify, in the same way, that = 0*5504, so that 
s 0*742. The departure of the regressions from linearity is only slight. 


36. Other relations 

We shall now prove that the mean square deviation of the 
weighted means of the vertical arrays from the line of regression 
of y on 0 ? is equal to — r*), that is to say 

= Noi(vt-r‘). (17) 

where is the estimate of from the regression equation § 26 (16). 
For, subtraction of (6) from § 26 (21) shows tiiat 

= :ss/«[(y«-r,)*-(y«-yi)*] 

» :SS/«[{(y«-y<)+(yi-ri)}*-(y«-y<)*] 

the sum of products vanishing, since D/i/(yt^-f{) is zero for each 
array. Thus (17) has been established. The difference f* is zero 
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only when each of the deviations is zero. In that case the 
means of arrays all lie on the line of regression, and the regression 
of y on a; is linear. We thus see again that a non-zero value of 
is associated with departure of the regression from linearity. 

We may prove, for use in a later chapter, that the sum of squares 
S niay be resolved into two separate sums. Thus 

S - y)* =* S - Yi ) + (li - y)]* 

= S»i(y<-ii)®+S»<(y<-y)*. (18) 

the sum of products vanishing, since 

= S2/«(yi/-r*)(y<-y) = o 

i * i 

in virtue of §27 (30). The sum in the first member of (18) is Na%^, 
or Nri\a^ by (10). The first sum on the right of (18) has just been 
proved equal to N&l{rj\—r^)\ and the final sum has the value 
Nr^a%, in virtue of §27 (32). The equation (18) thus corresponds to 
the identity 

- r») + Ni^a%. (19) 

37, Continuous distributions 

The preceding proofs may be adapted to a continuous distribution 
by an appropriate change of notation, and a replacement of sums by 
definite integrals. The vertical array, whose abscissa is x, we assume 
to be of infinitesimal breadth dx. Then, if f{x,y) is the relative 
frequency density, the relative frequency for this array is 

J [/(», y) dxl dy =» f^{x) da, (20) 

where /i(») - J/(».y)<iy> (21) 

the int^;ration with respect to y extending over the whole of that 
array. The mean of the y’s in the array is given by 

F*/i(») “ Jy/(*.y)<*y. 


( 22 ) 
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and this is the equation of the curve of means of the vertical arrays. 
The mean of the whole distribution has coordinates 


« = jj */(», y)dady = ^ dx, 

y = JJ yM y)dxdy = ^ Vsfiix) dx. 


(23) 


integration with respect to x including all the vertical arrays. The 
above relations correspond to the equations (1), (2) and (3) for a 
discrete distribution. 

The mean square deviation of the values of y, from the means 
of the arrays in which they lie, is given by 


y) dsedy. 


(24) 


and Tjy is defined as before by an equation of the form (6) or (6). 
The variance of the weighted means of the vertical arrays is 


O'mi/ = /(J'* - y) Vi(*) dx, (25) 

and, as before, this may be expressed in terms of and /S'®, since 

ot = JJ(y - j?) V(*. y) dxdy 

= JJ [(y - Vm) + (y* - y)]®/(*. y) dxdy 

= (26) 

the integral of the product being equal to zero; and from this it 
follows, as in the case of a discrete distribution, that 


Vy = 0’myl<ry 

Formulao corresponding to those of § 36 may be similarly esta- 
blished. For, by subtraction of (24) from the result m the Example 
of §26, 

(rlivl - r®) = ^ - -S? = jj[{y - F)* - (y - y^)^f{x, y) dxdy 

“ J[[{(y - y *) + iVx - 5")}* - (y - y*)®]/(». y) dxdy 

“ F)*/(a:,y)da:dy == jiV^- Y)*fi(.x)dx, 
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and this is the mean square deviation of the weighted means of 
arrays from the line of regression of y on x. And, corresponding to 
(18), we have the resolution 

J (y* - y)*/i(*) «** = f [(y* - y) + ( y - y)] Vi(») 

= jWx- (5^-y)Vi(*)<**. 

the various integrals corresponding to terms of the identity 

Vl^v - ~ *■*) + 

38. Bivariate normal distribution 

We shall now consider briefly the continuous bivariate distribu- 
tion, which is a generalization of the normal distribution discussed 
in Chapter ra. It may be introduced simply as follows.* Assume 
first that the variable x is normally distributed with s.d. (Tj. Then, 
if the variable is measured from its mean, the probability that a 
random value of x will fall in the interval dx is 

Assume next that the regression of y on a; is linear and homo- 
scedastic. Then, if (Tg is the s.d, of 1/ in the distribution, the common 
variance of the arrays of y's is <7|(1 — where p is thef coefficient 
of correlation between the variables. Finally, assume that each 
array of y's is normally distributed. Then, since the mean of each 
array is on the line of regression 

y^pxtrja^, (27) 

and the variance of each array is as stated, the probability that a 
value of j/, taken at random in an assigned vertical array, will fall 
in the interval dy is 

["2(r|(l-p2) (^ ~ V* T1 ‘ 

* Cf. Rietz, 1927, 2, pp. 104-7. 

t In the theory of sampling, r refers to the sample, and (Tj, p to the 
population from which the sample is drawn. 
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By the throrem of compound probability, the chance of a pair of 
values {x, y) falling in the elementary rectangle daedy is 

The probability density ^{x, y) for the distribution is therefore 



Such a dis tribution is called a bivariate normal distribution, and the 
variables are said to be normally correlated. The siuface z = ^(x,y) 
is the normal correlation surface. Since (28) is of the same form in x 
as in y, we may conclude that the regression of x on y is also linear, 
the variance of each array of x'e being cKl — p‘). The values of x 
in each such array are normally distributed, with mean on the line 
of regression of x on y, whose equation is 


X = pycrjor,. 


(29) 
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The m.g.f. for the distribution is found below,*** and from it the first 
few moments are calculated. 

The nature of the normal correlation surface is indicated in the 
diagram. The curves along which the probability density (or the 
relative frequency density) is constant are the homothetic ellipses 


trl o-jo-j o-| 


(30) 


With respect to these the line of regression of on a; is conjugate to 
the y-axis. For the locus of the mid-points of chords x = const, is 
the straight line (27). Similarly the line of regression of x on is 
conjugate to the aj-axis.f 


39. Intraclass correlation 

Let us now consider the correlation between the measures of 
some common characteristic for pairs of members of the same family 
or class. For example, we may be interested in the correlation 
between the weights of brothers, or the heights of sisters. The 
relation between two members of the same family is a reciprocal 
one; for, if P belongs to the same family as Q, then Q belongs to the 
same family as P. Each pair of members, P and Q, will therefore 
contribute two entries to the correlation table. In one of them x 
will be the measure of the characteristic for P, and y the measure 
for Q\ in the other, x will be the measure for Q, and y for P. The table 
will thus be symmetrical. 

We shall consider only the case in which each of the h families 
has the same number, A;, of members. Then there are A;(A; — 1) pairs 
of values for each family, and 

N^hk(h^l) (31) 

gives the total number of pairs of values in the table. Let x^^ denote 
the measure of the characteristic for the jth member of the ith 
family. Thus the first subscript indicates the family, and the 
second the particular member of that family. Consequently i takes 
the values 1, 2, ..., A, and j the values 1, 2, ..., k. The a;’s and the y’s 

♦ See Ex. V, 3. 

t For further propjdrties of this distribution see Exx. 4, 5 and 6 at the 
end of this chapter. 


was 


7 
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c£ tile CNsrrelatioii table are the hk yalues in diBbrent ordera. Any 
one Talne x^^, occurring as an x in the table, will have as its y eac^ 
of the other k— 1 values for the same family. Thus each value x^f 
occurs h— 1 times as an x; and the mean 2 for the bivariate dis- 
tribution is therefore 




(82) 


and, since the table is symmetrical, y has the same value. The 
variance a% of the x’s is the same as the variance of the hk values Xff, 
since each of these occurs the same number of times in the dis- 
tribution. Thus 

= ( 33 ) 


and 0 ^ has the same value, in virtue of symmetry. We may denote 
this common variance by <r*, so that 


or* = = <r*. (34) 

The coeffieietU of intmclass eorrdation, r, is given by the usual 
formula, which in this case is equivalent to 

hk{k- l)<r*r = 2 S S (»<f-®) (»fl-2) (35) 

< i I 

fc;i+i;» = l,2 h). 


The sum is a triple one; for the product {x^f—x) (x^— x) is the pro- 
duct of the deviations from the mean for the j'th and fth members 
of the tth family, and the sum must include all such products for 
different values ofj and I, and for all the families. We carry out first 
the summation with respect to I, observing that I takes all . integral 
values from 1 to ib except the value j. Thus the sum of the terms 
Xfi includes all values for the tth family except x^^, and may therefore 
be writtmi iact—Xtp where is the mean for the tth family. The sum 
of the terms 2 for the ib— 1 values of / is (k— 1)2. The triple sum in 
(35) may therefore be written as the double sum 

S S (*w”*) 1)^ 

-fcS2(x«-2)(2,-^2)-SS(x«-2)«.. 

if if 



40] Polynomial Regression , 00 

Now by ^33) thto last sum on the right is ei^ual to AJhr*. To evaluate 
the other we first eany out the summation with respect toj. Thus 

is S (»«”*) (*«-*) - 4S(S«-S)(i*i-iS) 

< 

where o^ is the variance of the means of the families. 

Substituting these values in (35) we obtain 

hk(k- i)(r*r = hk*al,-hka*. 


The coefficient of intradass correlation is therefore given by the 
formula 


1ca\-o* 


(36) 


CuBVSD Beqbbssion Lines 

40. Polsrnomlal regression. Normal equations 
We have seen that the line of regression of y on « gives the best 
representation of the behaviour of y with change of x, that can be 
given by a straight line, the term ‘best’ indicating that the sum of 
the squares of the deviations (or 'residuals’) is less for this straight 
line than for any other. But it is often apparent from the data that 
the regression of y on x is far from linear; and it may be desirable to 
find an equation of regression which affords a better representation 
of the behaviour of y than can be given by a straight line. The 
simplest type of non-linear regression equation is that in which 
one of the variables is expressed as a polynomial in the other. 
Polynomial regression of y on x is therefore represented by an equa- 
tion of the form 

y «6g-|-6iX+6,x*-H...-f&AX^ (37) 

in which the coefficients h, are constants. Our problem is to deter- 
mine these constants so that the sum of the squares of the residuals 
is a minimum. The choice of the d^ree, k, of the polynomial is at 
our disposal. If there are n different pairs of values in the distribu- 
tion, we can twaira the curve pass tibrough all the representative 
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points by choo^ng A; « n - 1. But, to keep the arithmetical work 
reasonably simple, k must be fairly small, say 2, 3, or 4. The distribn* 
tion of points in the scatter diagram will frequently suggest the 
shape of the curve of regression, and thus a suitable value for k. 
Having decided on the value of k, we determine the coefficients 
b, by the method of least squares. 

With the notation of § 23 suppose there are n different pairs of 
values, ft being the frequency of the pair {Xt,yt), and the total 
frequency N being given by 

N=iu (38) 

i-i 

Let Ht be the point on the curve (37) with abscissa Xf. Then its 
ordinate Yf has the value 

fe 

= b0+biXi + ... + b,cxi = 2 f>s^> (39) 

s-O 

and the deviation of the point Pf from the curve of regression is 

HtPt^Vt-Yt. (40) 

The sum of the squares of the deviations is 

(41) 

i 

We have to choose the coefficients 6. so that this sum is a minimum; 
and this is done by equating to zero the partial derivatives of S* 
with respect to these coefficients. We thus obtain the A;+ 1 normal 

= o (42) 

(i = l,2 n;s<s0, 1,2 k). 

Written separately, and in terms of the values of the given dis- 
tribution, these are 

= 0 ,) 

S/i*<(y<-6o-M<- ••• = 0, 

< (42') 


-—-***?) - 0 * 
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The coefficients b, are determined firom these h + 1 equations, which 
involve the sums of powers of £rom ^Xfto and Sums of 

products from For the particular case 4 ■■ 1 these 

equations are the same as § 25 (12). 

When the values correspond to equal increments, h, and the 
distribution of the frequencies is S3mimetrical about £, the 
equations (42') may be much simplified. Suppose first that n is odd 
and equal to 2m + 1. Then, by taking the origin of x at the middle 
value of that variable, and the common increment h as unit of 
measurement, we have for the values of a; in the distribution 

-m, — (m-1), .... -1, 0, 1, (m — 1), m. 

Hence, owing to the symmetry of the distribution of/’s, the sums 
of the odd powers of x are all zero. The sums of the even powers 
may be written down from tables or algebraical formulae. If, 
however, n is even and equal to 2m, we take the origin of x at the 
mean of the middle pair of values, and \h as the new unit. The 
values of x then become 

— (2m — 1), ..., —3, —1, 1, 3, ..., (2m- 1), 

and the sums of the odd powers of x vanish as before. Moreover, 
the equations (42') then consist of two groups, one involving 
^ 2 > other 6 ^, 6,, .... Their solution is thus simplified. 

Example. Fit a parabolic curve of regression of on a; to the seven pairs 
of values 

x: 10 1-5 2 0 .2*6 3 0 3-5 4-0 
y: 1-1 1-3 1-6 20 2-7 3-4 41 

The dot diagram of the seven pairs of values suggests a parabola. Since n 
is odd, and the values of x correspond to equal increments 0*5, we take the 
origin at the middle value 2'6 with 0*6 as the new unit. This is equivcdent to 
the transformation 

u = 2x—S. 

Each frequency/, is unity. The sums of odd powers of u are zero, and the 
calculation may be arranged as in the accompanying table. The normal 
equations are 

16*2 = 76 ,+ 286 „ 14*3 = 286 i , 69*9 = 286 ,+ 1966 ., 
which give immediatelv 

6.0 2 * 07 , 6,0 0 * 611 , 6,0 0 * 061 . 
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n^g^easion equation is there^^ 

y 8 2-07 -I- p-5i lu-ft- 0-0dli«> . 

» 2-07 4- 0-61 1(22;-* 5) 4- 0 061(2x-- 5)*^ 
which simplifies to y s 1-04— 6-20a;4-0-24«*. 


m 

u 

y 


u* 


t4*y 

1-0 

-3 

M 

9 

■■ 

-3-3 

9-9 


-2 

1-3 



-2-6 

6-2 


-1 

1-6 



-1*6 

1-6 

w 

0 





— 


1 



1 

2*7 

2-7 

3*6 

2 

kH 


16 

6-8 

13-6 

4-0 

8 

4-1 

9 

81 

12-3 

36-9 

Totals 

— 

16-2 

28 

196 

14-3 

60-9 


41. Index of correlation 

When considering the line of regression of y on we saw that 
I r I is the coefficient of correlation between the values and their 
estimates found from the regression equation, and that this 
coefficient is connected with the mean square deviation from the 
line of regression by the formula /S* * o^(l— r*). We propose to 
show that a corresponding relation holds for polynomial regression. 
Multiplying the normal equations (42') by 6o, ...» reBi>ectivel7 

and adding we obtain, in virtue of (39), 

'LlMi-Yt) = 0. (43) 

Consequently (41) is equivalent to 

NS^ - ZAyM-Tt) - (44) 

From the first of the normal equations, 0, it follows 

i 

that the y's and the F’s have the same mean. If this is taken as 
origin for both variables, their variances are given by 

Na^r^'LfiYl 

and the coefficient of correlation, B, ''etwecm the yt and the 1^ by 
BeryO r 'L/tViYi “ ^ ^ 

in virtue of (43), so that 

BOy - O-J-. (4fi) 
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Consequently (44) is equivalent to 

(46) 

which is analogous to § 26 (21). 

The coefficient B is thus an indication of the closeness with which 
the points Pf of the scatter diagram approximate to the regression 
curve (37). If 1, then S* = 0, and all the points P( lie on the 
curve. For this reason B is often referred to as the index of corre- 
lation for the regression carve (37). With each curve of regression 
there is associated such an index, given by 

B*^l-S»la*, (47) 

where 8* is the mean square deviation of the points Pf from the 
curve of regression. In virtue of (43) and (44) NS* in the case 
of polynomial regression is given by 

NS* = S 2 (48) 

i (.Oi-l 

Further, the relations (30) and (31) of §27 hold also for poly- 
nomial regression. For, in virtue of (43) and the first of the normal 
equations, we have 

2/<(y<-i;)(F,-y) = o. (49) 

i 

Then the sum of the squares of the deviations of the y's from their 
mean may be expressed 

'LSiiVi-y)* “ 2:/J{y,-ri)+(ii-y)]* 

-!;)*+ 2/*(ii-y)*. (so) 

in consequence of (49). This equation corresponds to the identity 
No* * 2^(^(1 - JJ*) -I- NB^^, 
os is evident from (45) and (46). 

Examjde. Prove, aa in 1 36, that is the mean square deviation 

of the weighted means of the vertical arrays from the curve of regression. 
Also that, if n< is the frequency in the tth vertical array, 

^n^t-9)* “ 2n,(p«- ri)*+2n^y«-»)*, 

oorresponding to the identity 
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42. Some related regressions 

A plotting of the values of x against the logarithms of the oorre* 
spending values of y may indicate an approximation to a linear 
relation between x and logy. In such a case we find a regression 
equation of the form 

y = CO* (61) 

where c and a are constants. For this relation is equivalent to 
logy logc+xlogo. 

If then we find the line of regression of logy on x, the constants of 
the equation determine both c and o. 

Similarly, if the plotting of logx against logy indicates that the 
relation between these quantities approximates to linear, we find 
a regression equation of the form 

y = c**, (62) 

which is equivalent to 

logy = logc+61ogx. 

We have therefore to find the line of regression of log y on log x, and 
the constants of the equation determine c and h. 

We might also find a regression equation of the form 

y = «*/<*), (63) 

where /(x) is a pol 3 aiomial in x. For this is equivalent to 

logy = logc+(logo)/(x), 

and therefore requires a polynomial regression of logy on x. 

More generally, the principle of least squares may be employed 
to find a regression equation of the form* 

y = ••• 

where are any functions of the independent variable x. 

The argument used in § 40 leads to the normal equations 

'LfMyi-Yi)^o 2/iX,(y,-r,) = o, 

* Cf. Fisher, Annob o/ Eugeniet, vol. ix, port 3, p. 238 (1939). 
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for the determination of the coefficients 6,. Multiplying these 
equations by bp respectively and adding, we find 

so that the sum of squares of the deviations from the regression 
curve is 

NS> = 

as in § 41 . If one of the quantities X, is taken as unity (or a constant), 
the mean of the y’a is equal to that of the Y'b. The argument of § 41 
then applies to the present case, leading to (45), (46), (49) and (50). 
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EXAMPLES V 

1. The heights of brothers, in five families of three each, are 
as follows: (67, 68, 69), (68 ,68, 71), (68, 70, 72), (70, 70, 73) and 
(71, 72, 73) inches. Show that the mean heights for the families 
are 68, 69, 70, 71, 72 inches, and the general mean x = 70. Also 
(T* = 18/5, a% — 2, and the coefficient of intraclass correlation of 
heights for the five families is 1/3. 

2. Verify that, for the distribution of Ex. IV, 8, the correlation 
ratios are « 0*695 and 7, * 0*686 approximately. 
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' 8.' HoaienJtd tmi m.g.f.for the bivariate normal disIr&nUion, The 
momenta may be deduced from the m.g.f. as defined in |31. To 
calculate this function let Oi and o, be the 8 . 0/8 of x and y in the 
distribution, and p the correlation between these variates. Then 
tire m.g.f. is given by 




where 


2n(ri<r,V(l -/>*)* 

By transforming the exponential we may write this; 


M{ti,tt) - c J_^exp [^«y- ^1(1 r- 2j {y* - (py + ^ 


Canying out the integration with respect to x, and rearranging the 
exponential, we have 

^ J— [” 2^|(y-/^i®’8^-^aO’i)*] 

■» exp [Kef if *1* *^^1^1)1' 


which is the required m.g.f. 

To calculate the various moments we have only to expand this . 
function in powers of and ij. The expansion is 

1 + K®'f ^f ”1" •••• 

Since p„ is the coefficient of ijm/rlsl in this expansion we have 

;t«, = ef, Pu’=P^i<rf #»m “ 3p®!<^*. 

/**i-(l + ?i»*)®1®’l - 3p®’i®*» /‘4 o“3®}. /*q4®8o1, 

«3.d so on, 

4. Show that the area of the ellipse, 1 38 (30), of constant ptob- 
abilit)^ density is wAVxO’,/^(l -p*); and hence that the acea d thh 
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strip betw^^'riiB ellipses eoiresponding to the ponuneter 'i^ues 
Aead.AH’iA.is 2nkdkaiVj^{\. — p'). Deduce the probability 

exp[-A«/2(l-p*)]AdA/(l-p*) 

that a pair of values (x, y) chosen at random, will be represented by 
a point inside this strip; and hence by integration the probability 
that the point will fall inside the ellipse A is 

If this probability is J, then A* = 1*3863(1 — p*). 

Also show that, for a given value of dA, the probability that the 
point (x,y) will fall in the strip between the ellipses A and A+dA is 
a maximum when A* » 1— p^ This determines the ‘ellipse of 
maximum probability’. (Cf. Rietz, 1027, 2, pp. 108-10.) 

5. The variates x and y are normally correlated, and y are 

defined by „ a .• a a ' a 

I s: zcoso+ysm&, 17 «■ ycosd— xsmo. 

Show that y will be uncorrelated if 

tan 20 = 2pcr^<r^{a\—a\f. 

The above transformation corresponds to a rotation of rectangular 
axes of coordinates; and 0 determines the directions of the principal 
axes of the ellipses of constant density. 

Show that, rS y are thus uncorrelated, and a\, cr* are their 
variances, then 

= O’!®", V(1 -P*)i ot +0^ =• o'l+o’i. 

6. Show that, if |, 7 are independent normal variates, and x, y 
are deL 1 'd by 

X >= ^cos0+7sin0, y B f OOS0— fsin^, 

the coefficient of correlation between x and y is given by (cf. Ex. 5) 



(o| - <r*)* sin* 20 + 4a|<j* ’ 

which is numerically greatest when 6 m±ln. The ezteeme values 
off axe ±(0^-®?)/^+*^)* 
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Also show that the points of inflexion of sections of the normal 
correlation suiface, by planes through the s>axi3, lie on the ellipUo 
cylintlep{*<r}+9*(7|ao-|(r5. (Cf. Yule, 1897, 1.) 

7. The profits, £y, of a certain company in the year of its 
life are given by 

a;: 1 2 3 4 6 

y\ 1250 1400 1650 1050 2300 

Taking u-x-Z and v = (y- 1650)/50, show that the parabolic 
regression of v on a is 

t> + 0-086 = 5-30« + 0-643tt*, 

and deduce that the parabolic regression of y on a; is 

y=1140 + 72-14a:+32-14a:*. 


8. Let X, y be normally correlated variates with zero means as 
in §38. Writing 


show that 


^ <ri * O’!/' 

0(M»,2) 1 

3(».y)”o-iO'8V(l-P®) 


and 


Deduce that the joint probability difierential of w and z is 


dP = — exp[-^(Mj*+ 2 *)]du)(fe, 


and hence that to, z arn independent normal variates, with zero 
means and unit s.n.’s. In other words w and z are independent 
standard normal variates. 



CHAPTER VI 


THEORY OF SIMPLE SAMPLING 

43. Random sampling fit>m a population 

In order to examine a large population with respect to a specified 
characteristic, the statistician chooses a sample of individuals from 
that population and, from the properties of the sample relating to 
the given characteristic, he endeavours to estimate those of the 
population. Suppose that the characteristic considered is the height 
of the individual. Then the assemblage of heights of all the individuals 
in the population is called a population or universe of heights, and 
those of the individuals in the sample is a sample of heights from that 
population. Similarly, we might consider populations of weights, 
wages, yields of grain, etc. In the same way, if our consideration is 
the percentage of male births in a very large population of births, 
we may be obliged, in estimating this percentage, to confine our 
attention to the data provided by a sample of such births. The theory 
of sampling is concerned, first, with estimating the properties of the 
population from those of the sample, and secondly, with gauging 
the precision of the estimates, i.e. with ascertaining the deviations 
from the true values that may be expected in the estimates obtained. 

Fundamental to the theory is the concept of random sampling. 
This is defined by the property that, in the selection of an individual 
from the population, each member of the population has the same 
chance of being chosen.* Statisticians have developed techniques 
for ensuring, as far as possible, that their sampling is random; but 
the reader who wishes to study the details of these techniques must 
consult other works.t 

Sampliko op Attributes 

44. Simple sampling of attributes 

In the sampling of attributes, as distinct from the sampling of 
values of a variable such as height, we are concerned only with the 

* See also Kendall, 1941, 2. 
t E.g. Yule and Kendall, 1037, 1, pp. 836-40. 
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possession or don-possession of some specified attribute or oharaoter- 
istio by tile individual select^ in sampling* For instance, in sampling 
from births we may be concerned only whether the baby is male or 
not. In sampling from a population of men our consideration may 
be whether they are smokers or non-smokers. The choosing of ah 
individual in sampling may be called a * trial’, and the possession 
of the specified attribute by the individual selected a ’success’. 
Simple sampling is random sampling with the further provision that 
the probability p of success is the same at each trial. Thus p is a 
constant in the process; and the probability of success at any trial 
is independent of the success or failure of preceding trials. The value 
of p is the relative frequency of the occurrence of the attribute in 
the population from which the sample is drawn. Hence, for the 
sampling to be simple, either the population must be very large, or 
the individual selected must be returned to the population before 
the next trial, success or failure having been noted. 

The problem connected with the drawing of a simple sample of 
n members is thus identical with that of a series of n independent 
trials, with constant probability p of success; and the results of 
§§11 and 17 are applicable. The probabilities of 0, 1, 2, ... successes 
in a simple sample of n members are thus the terms of the binomial 
expansion of (?+p)“. The binomial probability distribution thus 
determined is called the sampling distribution of the number of 
successes in the sample. The expected value, or mean value, of the 
number of successes is therefore np; the variance is npq^ and the 
standard deviation is ylinpq). This s.D. is usually called the standard 
error (s.B.) of the number of successes in a sample of size n, the 
deviation from the expected value np being looked upon as ’error’. 
The proportion of successes in a sample is obtained by dividing the 
number of successes by n. The expected value of the proportion of 
successes is therefore p; and the s.D. of the proportion of successes 
is l/n times that of the number of successes, i.e. ^{pq/n). Thus 

s.B. of the number of successes » 

s.B. of the proportion of successes s yjipq/n). 

The precision of the proportion of successes observed in the sample 
is regarded as iavera^ pzopcnrtional to the 8.B. of thi^ proportion. 
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Henoe the preddon of the observed proportion yaiies as In 
partioolar, to double the predsion it is necessary to increase the 
sue of the sample four>fold. 

45. Lar^e samples. Test of sitniflcance 

As we have just seim, the sampling distributions of the number 
and the proportion of successes in a simple sample of size n ate 
binomial distributions. It was also shown in Chapter m that, for 
latge values of n, the binomial distribution approximates to a 
normal distribution, in the sense that the probabilities for corre- 
sponding intervals in the two distributions tend to equality as n 
increases indefinitely. Now we know that the probability that a 
random value of a normal variate, of s.n. er, will lie outside the 
interval which extends 3or on each side of the mean is only 0*0027. 
Similarly, the probability that the value will deviate from the 
mean by more than 2<r is 0*0456, or about 4^ %. We may therefore 
conclude that, for large values of n, the probability that the number 
of successes in a simple sample of n members will differ from the 
mean by more than three times the s.B. is also very small; and that 
a deviation of more than twice the S.B. is rather unusual. 

Bearing this in mind we have a test of the credibility of the 
hypothesis that a given large sample, of n members, was obtained 
by simple sampling from a population in which the relative frequency 
of the occurrence of the attribute considered is p. Suppose it is 
found that the number of successes in the sample differs from the 
ex]3ected value np by more than Z^{npq). Then an event has hap- 
pened which, on the hypothesis of simple sampling, is very improb- 
able. We conclude then that the truth of this hypothesis is itself 
very improbable, and we say that the difference is highly significant. 
Considerations of the aspects of the problem will then lead us to 
suspect either that the value of p employed is incorrect, or else that 
the conditions of simple sampling were not observed. A deviation 
from the mean less than twice the s.s. is regarded as not significant. 
For deviations greater than twice the s.B. the significance increases 
with the deviation. The dividing line between significance and non- 
significance is, of course, not sharply defined. But significance is 
usuaUy regarded as beginning where, the ^bability of a larger 
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deviation is less than 6 %. It may be remarked that, while the 
above test may furnish evidence against the hypothesis, it cannot 
prove the hypothesis to be correct. The most it can do in its favour 
is to provide no evidence against it. 

The above argument still holds if, in place of the number of 
successes and its S.E., we employ the proportion of successes in the 
sample and its s.n. The expected value of this proportion in simple 
sampling isp; and we compare the deviation of the actual proportion, 
from this value, with the s.B., ■^(pqln), of this proportion. In some 
oases the value of p in the population is not known, but must be 
estimated from the sample. The estimate obtained from a large 
sample may be used without serious error in place of the true value, 
since the s.n. of the proportion of successes is small when n is large. 

Example. A certain cubical die was thrown 9.000 times, and a 6 or a 6 was 
obtained 3,240 times. On the assumption of random throwing, do the data 
indicate an unbiased die? 

On the hypothesis of an unbiased die the chance of throwing a S or a 6 is 
1/3. Thus p — 1/3 and q — 2/3. The expected number of successes is therefore 
3,000, and the deviation of the actual number from tliis value is 240. The 
S.B., e, of the number of successes is 

e = = V(9000 X i X I) = 10 V20 = 44-72, 

The deviation 240 is nearly 6-4 times this s.e.; amd it is therefore most un- 
likely to appear as a result of simple sampling with p = 1/3. We therefore 
conclude tliat the die is almost certainly biased, and that p is not equal to 1/3. 

The estimate of p obtained from the sample is 3,240/9,000 = 0-36. The 
8 JB. of the proportion of successes is then 

S' = V(0'30 X 0-04/9,000) s 0-0060, 

and 3s' = 0-016. It is therefore most unlikely that the true value of p lies 
outside the range 0-36 ± 0-016. In other words, the true value of p almost 
certainly lies between 0-345 and 0-375. 

46. Comparison of large samples 

Let two populations, and Pe, be tested for the prevalence of a 
certain attribute, by taking from them large simple samples of % 
and Ug members respectively; and let p^ and p, be the observed 
proportions of successes in the samples. Is the difierence, Pi—Pt, 
significant of a real difference between the two populations with 
respect to the given attributel On the hypothesis that the popula- 
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tions are rimilar in this respect, we may combine the samples to 
estimate the common value of the relative frequency of the occur- 
rence of the attribute in the populations. This estimate is then 




( 1 ) 


The 8.E.*s of the proportions of successes in samples of »i and 
members are -jipqluy) and y/(pq/n^) respectively; and, since the 
samples are independent, the varianoe e* of the difference of those 
proportions is given by 

( 2 ) 


i.Wi+i) 
W «2/ 


in consequence of the theorem proved in §§ 10 and 16. 

On the assumption that the populations are similar, the expected 
value of the difference Pi— Pt is zero, for 


E(Pi-Pt) “ ^(Pi)-E(pi) = 0. 


The sampling <ffstributions of p, and p, are approximately normal, 
when Ui and are large; and the same is true of tiieir difference 
since the samples are independent. Thus the distribution of Pi^p% 
is approximately normal, with mean zero and s.d. e. The probability 
that, in simple sampling, the difference Px —p^ will be numerically 
greater than 36 is therefore very small. The probability that it will 
be greater than 26 is in the neighbourhood of 5 %; and any value 
smaller than this is regarded as not significant, i.e. as providing no 
evidence against the hypothesis. 

Example 1. In a simple sample of 600 men from a certoin large city, 400 
are found to be smokers. In one of 900 from another large city. 450 are 
smokers. Do the data indicate that the cities are signilicontly diHerent with 
respect to the prevalence of smoking among men? 

Here Pi = 2/3, = 1/2, so that Pi — p, s 1/0. On the assumption that the 

cities are alike with respect to the prevalence of smoking among men, we have 
as our estimate of the common value of p, 

_ 850 _ 17 

^ “ 1600 “ so’ 


and the variance of the difference of the proportions for the two samples is 


' "’’•h/ “ so ** so \600'^900/ 


s 0*000682. 


wit. 
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Hence « s 0*026.’ The observed difference is greater than 6e, and is therefore 
highly significant. Our assumption that the populations are s in ni l ar is 
therefore almost certainly wrong. 


Next suppose that simple samples, of % and n, members, are 
drawn fix>m populations in which the proportions are Pi and p, 
respectively (Pi >Pa). Is it likely that the proportions p[ and p^ in 
the samples will be such that p[ —p^ < 0; in other words, is the real 
difference between the populations likely to be hidden in sampling? 
As before, the distribution of p'x—p^ is approximately normal for 
large values of snd ngl I*ut the mean of the distribution is now 
Pi— Pa,'and its variance, being the sum of the variances of p^ and 
Pa, is given by 

(3) 


In order that the sample value of should be negative, its 

deviation from the mean must be on the negative side^ and numeri- 
cally greater than Pi-^p^- If Pi— P 2 > 3e, the probability of this is 
very small. If, however, Pi — P 2 niuch less than 2e, such an event 
would not be very unusual. For the particular case in which 
Pj— P a = 2e, the probability of the event is in or near the interval 

2-2i%. ^ 


Example 2. In two large populations there are 35 and 30 % of fair-haired 
people. Is the difference likely to be revealed by simple samples of 1,500 
and 1,000 respectively from the two populations? 

Here pi = 0*35 and p, = so that Pi— Ps = 0*05. The variance of the 
difference of the proportions in the samples is 


€* = 


(0*35) (0*65) (0*3) (0*7) 
1600 1000 


= 0*000362, 


so that £ = 0*010. The difieronce P|— p, is about 2*6£. The probability that 
the real difference between the populations will be hidden is approximately 
the probability that, for a random value of a normal variate, the deviation 
from the mean will be on the negative side and greater than 2*6 times the 
S.D. Since this is less than ^ %, it is unlikely that the difference will be hidden. 


47. Polssonian and Lexlan sampling. Samples of varying size 
Consider next a few modifications of the condition of simple 
sampling, b^inning with Poisson's series of Iriais (cf. § 11, Ex. 3). 
Suppose that, in drawing a sample of attributes of n members, the 
chance of success changes at each drawing. Let be the probability 
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of success at the tth drawing. Then the expected value of the 
number of successes in the sample is the sum of the expectations 
at the individual drawings; and this is 


SPi = np, 


where p is the mean of the quantities p^. Also, since the drawings 
are independent, the variance e* of the number of successes in the 
sample is the sum of the variances of the numbers of successes at 
the separate drawings, so that 

If cr^ is the variance of the quantities we may write this 


e^ — np — n{p^ + cr^) = npq — ncr* . (4) 

Thus the variance of the number of successes is less than when the 
probability remains constant and equal to p. Dividing by w* we 
have the variance e'* of the proportion of successes in a Poissonian 
sample « 

( 6 ) 




n 


p 

m 

n 


Consider next a Lexian series of trials Suppose that, in taking 
N simple samples of attributes of n members each, the probability 
of success varies from one sample to another. Let p^ be the value 
in the ith sample (i — 1,2, We wish to find the s.e. of the 

number of successes per sample, when the records of all the samples 
are pooled. Let p be the mean value of the probability, so that 
Np = 2 Vi- Then the expected value of the number of successes in 
the whole series of samples is equal to the sum of the expected values 
for the individual samples, and this is 

2 np^ = nNp. 

Hence the expected value of the number of successes per sample is 
np. To find the variance 6* of the number of successes per sample, 
we observe that the mean square deviation from np in the ith 

““P*® “ np^qt + [npt - np)K 

Summing for all the N samples, and equating to Ne^, we have 
ATe* » nS + »*2 (j>< -p)*. 

• Studied by W. Lexis in 1877. 

S-t 
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Now the last tenn has the value where is tiie vaiianoe of 

the quantities p^. In the other sum we may write 


2 !><?< - 2 P< - 2 p! - Np-N{o* + j)») 

» Npq—Na^. 

Substituting these values in the equation, and dividing by N, we 
e* = npj+n(B-l)or*, (6) 

which is the required variance of the number of successes per sample. 
Dividing by n* we have the variance of the proportion of successes 
per sample, viz. pq n-\ 






( 7 ) 


Both the variances (6) and (7) are greater than in the case of mmple 
sampling with constant probability p. 

Lastly, we may consider the modification of simple sampling in 
which the probability p of success remains constant, but size n 
of the sample varies about a mean n with variance (r^ . To find the 
variance of the number of successes per sample, we observe first 
that the mean number of successes per sample is np. Then, for 
samples of fize n, the mean square deviation of the number of 
successes from np is npq. Consequently the mean square deviation 
from the general mean np is 

npq + {np—np)K 

The expected value of this mean square is the required variance of 
the number of successes, so that 

6* =« np^+pVJ. (8) 

We shall mahe use of this result in a later chapter. 


Samflino of Valobs of a Vabiablb 

48. Random and simple sampUng 
We pass now to the consideration of sampling of values of a 
variable and measurable quantity, such as height, age, yield of 
gnun, etc. Each member of the population of individuals, objects 
or experiments provides a value of the variable; and we thus have 
a population of values of the variable, and the frequenoy diatnUbu* 



48, 49] Sampling of Variables 117 

tion determined by it. In drawing a sample of n members from the 
population we are choosing n values of the variable from those of 
the distribution. 

We have already defined random samjiling as sampling in which 
each member of the population has the same chance of being chosen. 
In the case of a population of discrete values of the variable x, the 
value Xf may occur times. There are then ff members of the popula- 
tion each equal to Xf. Hence the probability of the value in the 
selection of an individual by random sampling is fJN, which is the 
relative frequency of that value in the population. Similarly, if the 
population of values of x has a continuous distribution with relative 
frequency density /(x), the probability that, in the random selection 
of an individual, the value of the variable will fall in the interval 
dx is f{x)dx. Simple sampling is random sampling with the further 
provision that, in the selection of an individual from the population, 
the probability of obtainingavalueof the variablewithinanyspecified 
range remains constant throughout the sampling. In particular, with 
a population of discrete values of the variable, the probability of ob- 
taining a specified value Xf remains constant during the sampling. 
Thus, in simple sampling, the system of probabilities associated with 
any drawing is independent of the results of preceding drawings. 

4 population, whose distribution is continuous, contains an 
infinite number of values in any finite interval in the range of the 
variable. In drawing a finite random sample from such a population, 
the probability associated with any interval remains unchanged, 
and the sampling is therefore simple. Thus a finite random sample 
from apopulation whose distribution is continuous is asimplesampie. 
It is the common practice to refer to such a sample as a ‘random 
sample ’. If, however, the population contains only a limited number 
of values, the sampling wili not be simple unless each value selected 
is returned to the population before the drawing of the next value. 

49. Sampling distributions. Standard errors 

The distribution of the variable in the population has its mean, 
variance, moments of higher order, partition values, etc., which 
are spoken of generally as the parameters of the population. Simi- 
larly, each simple sau^de from the population determines a fre- 
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quency distribuition of the variable, from which the njean, variance, 
etc., of the sample may be calculated. These may be regarded as 
estimates of the values of the parameters of the population. Any 
such estimate obtained from the sample is called a statistic. More 
generally any function of the sample values, used as an estimate of 
a parameter of the population, is called a statistic. 

When' the Retribution of the variable x in the population is 
known, it is theoretically possible to determine the probability 
that the estimate z of any parameter, obtained from a simple sample 
of n members, will lie in the interval dz. Let this probability be 
denoted by ^(z) dz. Then the density ^(z) determines a probability 
distribution called the sampling distribution of that statistic for 
simple samples of size n. Thus the sampling distribution is a con- 
tinuous distribution, determined by the nature of the population 
and the size of the sample; and the s.n. of the sampling distribution 
is called the standard error (s.s.) of that statistic for samples of n 
members. The sampling distribution is often defined as the dis- 
tribution of the values of the statistic obtained from an infinite 
(or very large) number of simple samples of the given size. This 
alternative way of looking at it may be a help to the student. The 
two definitions bear the same relation to each other as the d priori 
definition of probability and the empirical definition. The reader 
should bear in mind that a sampling distribution is essentially a 
probability distribution. 

Distributions of statistics, for random samples from a normal 
population, will play a prominent part in later chapters. It was 
shown in §22 that the distribution of the means of such samples is 
normal; and we know its s.n. In general, for normal populations, or 
populations whose frequency curves are unimodal and only moder- 
ately skew, the sampling distributions of many of the common 
statistics approximate to the normal type as n increases indefinitely; 
so that, for large samples, they possess the property that the prob- 
ability of a sample value of the statistic deviating from its mean by 
more than three times its s.E. is very small. This enables us to apply 
the test of §45 to statistics obtained from large samples; and the 
determination of the s.b. in such cases is a matter of importance. 
Tests appropriate to small samples will be considered in Chapter x. 
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50 ] Distribution of the Mean 

50 . Sampling distribution of the mean 
The distribution of the means of random samples of given size, 
from a specified population, is a question of considerable importance. 
Let the sampled population have mean [i and variance We shall 
first prove by elementary methods that the sampling distribution 
of the mean of random samples of size n, has [i for its mean and 
(T^jn for its variance. We shall then prove more generally how all 
the moments of the distribution of the mean may be deduced simply 
from those of the population, by means of the properties of the 
cumulative function. 

Reasoning in terms of a population of discrete values, let the rela- 
tive frequency of the value in the population bejp^ (i = 1, 2, ..., k). 
Then 

( 9 ) 

and <r^ = (10) 

In the case of a sample of one member, the distribution of the mean 
is clearly the distribution of the variable in the population. For the 
single value Xg in the sample, associated with probability Pg, is the 
mean of the sample. The expected value of the mean of the sample 
is therefore 

S{x,) = 'ZPiXt=ft. 

Consequently the variance of the distribution of the mean of a 
sample of one is 

=* 0 -* 

as stated. For a sample of n values Xj (7 = 1, 2, , n), the mean x 
is given by 

TlX = 2 

Taking the expected value of each member we have 

E{im) — Ei^Xj) = '^E[Xj) = 2 /* = 

so that E{x) ft. (11) 

Thus Uie mean of the popidation is the mean of the sampling distribu- 
tion of X. Further, since the values in the sample are independent, 
the sampling variance of their sum, na, is the sum of the variances 
of the separate values. But each of these values is a sample of one 
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member, whose distribution has vatianoe a*. Consequently the 
▼arumoe of nS is no*, and its s.n. is The s.d. of 3 is therefore 
(t/^. This is the required s.b. of the mean. 

If the distribution of values in the population is continuous, with 
rdative frequency density f(x), the relative frequency for the in- 
terval dx is f{x)dx; and this is the probability of a random value 
coming from that interval. To adapt the above argument we have 
only to replace by f{x)dx, by x, and summation with respect 
to » by integration extending over the values in the population. 
Thus (9) and (10) become 

/isajxf{x)dx 

and cr* = J(as — /»)*/(*) 

Summation with respect to j remains unaltered, since it extends 
only over the n values of the sample.. 

Example. A sample of 900 members is found to have a mean of 3*4 om. 
Could it be reasonably regarded as a simple sample from a large population, 
whose mean is 3*25 cm. and s.o. 2*01 cm.? 

The 8.S. of the mean of a simple sample of 900 from such a population is 
2*01/30 = 0*087 cm. The deviation of the mean of the sample from that of 
the population is 0*15 om. This deviation is less than twice the s.e. of the 
mean, and is therefore not significant. We conclude that the given sample 
might be one drawn from the population specified. 

The various moments of the sampling distribution of x may be 
found very simply* by using the additive property of cumulants, 
proved in § 16. For, since nx = il* follows from this property 
that the cumulative function of tub, being the sum of those of the 
variates x^, is given by 

K{1\ nS) s nK{t\ x) 

■a nKit+nKf(*/2\+nKifil3\ + 

tiie cumulants k, being those of the population, and the second 
argument in brackets denoting the variate whose cumulative 
function is indicated. But the rth moment of 3 is obtained from that 


• Cf. Fisher, 1929, 1. p. 202. 
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of n3 by dividing by Hence the onmulatiiye function of 2 is found 
by substituting Ijn for t in the above expansion. The cumulative 
function of the distribution of the mean is therefore 






The rth cumulant for the distribution of the mean of the sample 
is thus found from that of the population by dividing by n*^^. In 
particular, the mean of the distribution of 2 is k^, and is therefore 
the same as the mean of the population. The variance of 2 is cr^/n, 
and the third moment about the mean of the distribution is /(3/a'. 

That the sampling distribution of the mean is approximately 
normal for large samples, when the population has only moderate 
skewness and excess, also follows from the results of § 16. For, 
sinoe the rth cumulant of the distribution of the mean is obtained 
from that of the population by dividing by n**'*, the skewness of the 
distribution of the mean is 



Similarly, the excess of kurtosis of the distribution of the mean 
K 7^^ 1 

-4 -za- (excess of population). 

71* ICl 71 

Thus, for large samples from a population of moderate skewness 
and excess, the skewness and excess of the distribution of 2 are 
small, and the distribution is approximately normal. 

51. Normal population. Fiducial limits for unknown mean 
Suppose that the population, from which the random sample of 
n values is drawn, is a normal population with mean jt and s.d. <r. 
Then the sample values are independent normal variates, with the 
same mean and s.d.; and, by the theorem of §22, their mean 2 is 
normally distributed with mean n and s.d. tr/^. If we know a* 
but not /(, there is a range of possible values of fi for which the 
observed mean 2 of the sample is not significant at any specified 
level of probability. In the sampling distribution of the mean, the 
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relative deviation of x from its expected value is the ratio of x^/i 
to the s.s. of x\ and this ratio is (x-‘/i)^nlo'. If then the observed 
value X is not significant at the 6% level of probability, this 
relative deviation must be less than that which, in a normal dis- 
tribution, is exceeded numerically with a probability of 6 %. Such 
a relative deviation is 1*96. Consequently the observed mean x 
will be not significant provided 


which requires 


<r 


< 1-96, 


a: — l'96(rlyln < ^i< x+hOSa/^n. 


The values x+ hQQcrl^n are caUed the ^5% fiducial limits, or con- 
fidence limits, for the mean of the population corresponding to the 
given sample. They are the limits within which /i must lie, in order 
that the observed sample mean should not be significant at the 
prescribed level of probability. 

Similarly, we define the fiducial limits for other levels of prob- 
ability. Thus, in a normal distribution, the relative deviations 
which are exceeded numerically with probabilities of 2 and 1 % are 
2*33 and 2-58 respectively. Hence the 98 % fiducial limits for the 
mean of the normal population, corresponding to the given sample, 
are x + 2-33(r/^/w; and the 99 % fiducial limits are x + 2*68cr/>^7i. 

Example. Show that, in the example of the preceding section, if the 
population is normal but its mean unknown, the 95 % fiducial limits for the 
mean are 3*23 and 3*57 cm., and the 98 % fiducial limits 3*20 and 3*60 cm. 


52. Comparison of the means of two large samples 

Given two independent simple samples, of and members 
respectively, we may wish to examine whether the difference of 
their means may be accounted for by fiuctuations of sampling, the 
two samples being regarded as drawn from the same population of 
s.D. o. The s.E.’s of the means of samples of rii and members 
from this population are (r/^Ui and respectively. Hence, on 
the assumption that the samples are independent and drawn from 
this population, the s.s. e of the difierence of their means is given by 



( 12 ) 
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The sampling distribution of this difference has zero for its mean, 
and is approximately normal if % and are large. Consequently, 
if the observed difference of the means exceeds 3e, it can hardly be 
ascribed to fluctuations of sampling; and our assumption that the 
samples were drawn from the same population is almost certainly 
incorrect. If the difference is greater than 26, it is regarded as 
significant at the 6 % level of probability. When the variance of the 
population is not known, it may be estimated from the combined 
sample of members, unless the variances of the two samples 

are inconsistent with the assumption that they were drawn from 
the same population (see § 60 below). 

If the two samples are known to have come from different popu- 
lations, with variances a\ and a\ respectively, we can test by a 
similar procedure whether the two populations may have the same 
mean. The standard errors of the means of samples of rii and 
members from the two populations are and respec- 

tively; and the s.e. e of their difference is then given by 





(13) 


On the assumption that the two populations have the same mean, 
the distribution of the difference of the means of the samples has 
zero for its expected value, and is approximately normal for large 
samples. We may therefore test the significance of the difference of 
the sample means, by comparing it with e in the usual manner. As 
before, if the variances of the populations are not known, they may 
be estimated from the large samples. 


Example. A simple sample of heights of 6.400 Englishmen has a mean of 
67*85 in. and a s.D. of 2*56 in., while a simple sample of heights of 1 ,600 Austra- 
lians has a mean of 68-65 and a s.D. of 2-52 in. Do the data indicate that 
Australians are on the average taller than Englishmen? 

The s.B. of the mean of a sample of heights of 6,400 Englishmen is 
2*56/80 = 0-032 in., and that of the mean of a sample of heights of 1,600 
Australians is 2*52/40 = 0*063 in. The s.B. of the difference of the means is 

e = Vt(0-032)* + (0*063)*] = ^(0*004993) = 0*07 in. 

The observed difference between the means of the samples is 0*70 in., which is 
10 times its s.e. Ffence the data are inconsistent with the assumption that 
the means of the two populations are equal; and we conclude that Australiains 
are on the average taller than Englishmen. 
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53. Standard MTor of a partition value 
Condder the s.b. of a partition value for a lai:ge rapdom sample 
of n members, drawn from a continuous population in which the 
relative frequency density of the variable is/(x). Let the partition 
value be that for which p is the fraction of the firequen<^ lying 
above it, and q the fraction below it. Also let; Xp be the partition 
value for the population, and Xj, + 9xp that for a large sample of n 
items. The relative frequency of V|dues above Xp in the population 
is p. In the sample the relative frequency above Xp + SXp is p, and 
above it is p + ip. Thus Sp is the relative frequency in the sample 

for the Interval SXp. Now for a large sample Sxp and Sp are small; 
and Sp is, to within infinitesimals of higher order, the relative 
frequency for the interval Sxp in the population. Consequently 

4P */(«,) = ySXp, 

where y is the ordinate at Xp for the frequency curve y =f{x) of 
the population. On squaring we have 

{Sp)* = y*(Sxp)\ 

Now y is independent of the sample, and SXp and Sp vary about 
zero as mean. Hence, on taking expected values of each side, 
we obtain 

EiSXp)»^^,E{Sp)*. 

Now jEt(dXj,)* is the vmiance of the sampling distribution of Xp, and 
its square root is the s.b. of this partition value. Similarly, Eifip)^ 
is the variance of the proportion of successes in n trials, with con- 
stant probability p that the value of the variable selected will be 
grater than Xpi and we know that this variance is pqin. Con- 
sequently, on taking the square root of both sides of the above 
equation, we obtain the required s.E. e of the partition value* as 



If the value of y for the population is not given, it may be estimated 
firom the frequent distribution of the large sample. 


* SeealM KendaQ. 1940 , 4 . 
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An important case is that in which the distribution of the Tariable 
in the population is normal, with s.n. a. The value of p is the area 
under the normal curve to the right of the partition value; and the 
table of areas in §21 enables us to read off the value of «p/<r. The 
table of ordinates for the normal curve in § 20 then gives the corre- 
sponding value of and substitution of the values of y, p, q and n 
in ( 1 4) gives the required s.s. For example, in the case of the median 

? = J> »/<r.=:0, o-psO'SOSO. 

Hence the s.B. of the median, for a large sample of n members from 
a normal population, is 

O' /iiii.1.25^ 

0-3089 V » V»* 

This is 25 % greater than the s.e. of the mean. For the upper 
quartilep ss the area from the mean to the quartile is 0-25, giving 

a:/<r = 0-6745, ay = 0-3178. 

The S.B. of a quartile is therefore 


Example, Prove in the same manner that, for large samples of n members 
from a normal population of s.d, (T, 

S.B. of 1st and 9th deciles = 1*7 la/^, 

8.B. of 2nd and 8th deciles = l*43(r/^, 

8.E. of 3rd and 7th deciles = l*32(r/^, 

S.B. of 4th and 6th deciles = 1*270'/^. 
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EXAMPLES VI 

1. A biased coin was thrown 400 times, and heads resulted 240 
times. Find the s.B. of the proportion of heads in 400 throws; and 
deduce that the probability of throwing heads in a single trial 
almost certainly lies between 0*63 and 0*67. 

2. A random sample of 600 pineapples was taken from a large 
consignment, and 65 were found to be bad. Show that the s.s. of 
the proportion of bad ones in a sample of this size is 0*0 1 5 ; and deduce 
that the percentage of bad pineapples in the con signm ent almost 
certainly lies between 8*6 and 17*6. 

3. In a random sample of 800 adults from the population of a 
certain large city, 600 are found to have dark hair. In a random 
sample of 1,000 adults from the inhabitants of another large city, 
700 are dark-haired. Show that the difference of the proportions of 
dark-haired people is nearly 2*4 times the s.B. of this difference for 
samples of the above sizes. 

4. In two large populations there are 30 and 25 % respectively 
of fair-haired^ people. Is this difference likely to be hidden in sam- 
ples of 1,200 and 900 respectively from the two populations? (The 
difference, 0*05, in the proportions is more than 2^ times the s.B. 
of this difference for such samples. Hence it is unlikely that the real 
difference will be hidden.) 

5. Given that, on the average, 4 % of insured men of age 66 die 
within a year, and that 60 of a particular group of 1,000 such men 
died within a year, show that this group cannot be regarded as a 
representative sample, seeing that the actual deviation of the 
proportion of deaths is more than three times the s.B. of the pro- 
portion for samples of this size. 

6. In sampling of attributes a sample is drawn containing an even 
number » of members. In drawing each of the first members the 
probability of success is p, and for each of the remaining ones it is 
1—p, the drawings being independent. Show that the expected 
number of successes is |n, and the variance the number of suo- 
oesses is npq. 
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7. The sampling distribution of a certain statistic is normal, 
with mean 2*0 and s.e. 1*5. Show that the probability that a simple 
sample will yield a value of the statistic greater than 5*0 is 0*02275; 
and the probability that the value found will lie outside the range 
— 1*0 to 5*0 is 0*0465. 

Also find c if the probability of getting a value of the statistic 
greater than c is 0*07. (Ans. c = 4*213.) 

8. A normal population has a mean of 0*1 and a s.d. of 2*1. Find 
the probability that the mean of a simple sample of 900 members 
will be negative. {Ana. 0*077 nearly.) 

9. A sample of 9C0 members is found to have a mean of 3*47 cm. 
Can it be reasonably regarded as a simple sample from a large 
population with mean 3*23 cm. and s.d. 2*31 cm.? {Ana. No. The 
deviation x—fiia more than three times the s.B. of the mean.) 

10. The means of simple samples of 1,000 and 2,000 are 67*5 
and 68*0 in. respectively. Can the samples be regarded as drawn 
from the same population of s.d. 2*5 in.? {Ana. No. The difference 
of the means is more than 5 times the s.e. of the diSerence, which 
is 0*097 in. nearly.) 

11 . Show that the s.e. of the 8th decile of a simple sample of 900 
members drawn from a normal population of s.d. 2*5 cm. is 0* 12 cm. 
nearly. 

12. Prove that the coefficient of correlation between errors in 
the partition values corresponding to proportions p and p' of the 
frequency lying above them is, fur large samples from a continuous 
population, ^(p'qlpq') in which p>p'. In particular, for the lower 
and upper quartiles this coefficient is 1/3. (Cf. Yule and Kendall, 
1937, 1, p. 385.) 

13. Using the value e » 1*3630*/^^ for the s.E. of the lower (or 
upper) quartile in a large sample of n values from a normal popula- 
tion of s.d. find the s.e. of the semi-interquartile range. 

Since the interquartile range is the difierence between the upper 
and lower quartiles, and the correlation between errors in these 
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9 tatistioB in a lai;ge sample is 1/3 (of. Ex. 12), the 8.x. of the inter* 
quartUe range is, in virtue of § 32 (46), 

V(e*+e*-t6*) - 2e/V3. 

The 8.x. of the semi-interquartUe range is therefore 
e/V3 =■ l-363<r/V(3n) = 0-787a/>. 

14 . The quantities Zt (» » 1,2, ...,n) are independent values 
diosen at random, one from each of n populations with the same 
mean, and with variances <rf. Show that the linear estimate of the 
common mean, which has the least sampling variance, is that 
obtained by weighting the »*s inversely as their variances. Also 
prove that this minimum variance is Hjn, where H is the harmonic 
mean of the variances a\. (Cf. Ex. iv, 6.) 

15 . From a finite population of N values, with variance a^, a 
random sample of n values is drawn without replacements. Show 
that the sampling variance of the mean of the sample is 

(N-n)a*jn{N-\). 

16 . VaricMe Poiaaonian aampling. The three types of sampling 

considered in § 47 are all included in the following. Suppose that, in 
tile drawing of a Poissonian sample, there are various types of 
sampling, each type having its own probability. Let be the 
probability of the ^'th type of Poissonian sample, the size of a 
sample of this type, (i L probabilities of success 

at the individual drawings in this type, so that the expected number 
of sucesses in such a sample is = 'LPki- virtue of § 47 (4) the 

vaiianoe of the number of successes per sample in this type is 


4 





where a% is the variance of the for the Hh type of samplmg. The 
expected number of successes per sample when ail types are 
possiMeis 





Examples 129 

and the variance of the number of successes per sample for all types, 
being the mean square de\riation of the number of successes per 
sample from x, is given by 

k 

where o% is the variance of the Xj^ for all types. Hence the general 
formula* 

6* = ® - S 07* 5^ + n*(7|J . 

In the particular case, for example, in which all the pjy have a 
common value, p, we have 

<T'* = 0, 2=^, oi=i>*ot. 

and the above formula becomes 

fi* = = j^g+pM. 

k 

which is § 47 (8). 

17. In a certain population the proportion of members possessing 
a given characteristic is p. Prove that, if p may be varied, the 
probability of obtaining m such members in a simple sample of n 
from the population is greatest when p = mjn, 

18. Prove that, in simple sampling from a population, the 
expected value of the rth moment of the sample about any fixed 
value is equal to the rth moment of the population about that 
value. 

• Cf. Aitken, 1939, 1, pp. 63-4 and Coolidgo, 1926, 2, pp. 66-72. 


9 



OHAPTSR Vn 


STANDARD ERRORS OP STATISTICS 

54. Notation. Variances of population and satnplo 
Having already considered the s.ii.’B of the mean and the partition 
values, we now pass to those of various other statistics. We shall 

argue in terms of a population of discrete values Xi {% a l, 2 h), 

the relative frequency of the value x^ in the population being 
The argument covers approximately the case in which the values 
of the population are grouped in k classes of small class interval, 
the »th class being centred at the value x^, and having a relative 
frequency The sum of the relative frequencies of all the other 
classes is therefore qi, where as 'Usual => 1. Then, as in §60, 
the mean ft, of the population is given by 

ft 

/» = S (1) 

i-i 

and its variance by 

- <r* =» S P<(«i -/«)• = S PiA-P'- (2) 

Consider now a simple sample of n members drawn firom this 
population. The sample values fall into the same classes as above; 
that is to say, the values x^ are the same for the sample as for the 
population, but the relative frrequencies of the classes vary with the 
sample, fluctuating about the corresponding values for the popula- 
tion. The probability that, in simple sampling, a value drawn will 
belong to the ith class is p^. Hence the frequency of that class, in 
a sample of n members, has mean value np j, since this is the expected 
value of the munber of successes in n independent trials with con- 
stant probability p^ of success. Thus 

( 3 ) 

The mean x of the sample is given by 

(4) 

and its variance 8* by 

n5«-S/i(«i-S)* = S/i«!-nx*. 


(S) 
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We may notice at the outset that the expected Talue of 5* in 
sampling is not <r* but (n— l)<r*/n. To prove this we observe that 
8*, being the variance of the sample, is the second moment of the 
sample about ft, diminished by (x— /t)‘. Thus 

Consider the expectation of both members of this equation. In the 
first term on the right remains constant during sampling; 

and therefore, in virtue of (3), the exjwcted value of this term is 

[(*<-/«)“ W*)] = S !><(*<-/*)* = or*. 

Further, E(x—fi)* is the sampling variance of the mean, and is 
therefore equal to a^jn. Substituting these values we obtain 

= ( 6 ) 

n n 

as stated above. If then we write 

= = (7) 

our result is E{s*) — a*. (8) 

We' say that a* is an unbiased estimate of <r*, because its mean value 
in sampling is o*. It is in this sense a better estimate of the popula- 
tion variance than 8*. Incidentally also it follows from the above 
argument that the second moment of the sample, about the mean 
of the population, has a* for its mean value. 

55. Standard errors of class frequencies 
We have seen that, in simple sampling from the above population, 
the probability tiiat a value will fall into the tth class of the sample 
is Pi, and that the frequency ft of this class has mean value npi. 
The deviation of this frequency from its mean value will be denoted 
by ^ i . The s.b. of ft is the square root of the expected value of 
{S/i)*i and, since is the number of successes in n trials of constant 
probability its variance is npigt. Thus 

W<)* - mii - -Pi)‘ 


( 9 ) 
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lipi is unknotm, an estimate of its value derived from the sample 
is/J»; and, as an approximation to the above sampling vajciance, 
we have 

(lo) 

But this is a fair approximation only if n is large. A better estimate 
is obtained by multiplying this expression by n/(m— 1). To prove 
this we consider the expected value of the second member of (10). 
Since E{J\) is the second moment of/^ about zero in its sampling 
distribution, we have 

= sampling variance of f^ + [^(/<)]* 

Consequently 

-/?/») = «i>< - -Pi) + npl] 

= {n-l)pt(l-Pi)^^Em‘- 

n 

Hence the formula 

(10') 

gives a bettet estimate than (10), since the expected value of the 
second member of (10') is equal to the first member, so that it is an 
unbiased estimate of the sampling variance of ft. For a large sample 
the factor nl{n— 1) is nearly equal to unity, and (10) gives a suffi- 
ciently good estimate. 

56. Covariance of the frequencies in different classes 
Since the sum of the class frequencies in any sample is constant, 
for samples of given size n, it follows that 

= ( 11 ) 

The covariance between the frequencies and/y of the tth and jth 
classes is the covariance, E{8ftSff), of their deviations djft and Sjff 
from their mean values npy and npj. The calculation of this co- 
variance may be associated with a correlation table in the values 
of and iff. Consider the array in which Sf^ has a fixed value. 
Bearing (11) in mind we assume that, in all the sunples for which 
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dff has this fixed value, the excess Sft produces an equal deficiency 
which is distributed among the other class frequencies, on the 
average, in proportion to their expected values; that is to say, we 
assume that, in the array with constant the average value 
of 4f/i8 

In calculating the covariance we may, as proved in §33, replace 
each value of d/y in the array by the mean value ^ for that array. 
Then 

“ -nPiPy (13) 

in virtue of (9). This is the required covariance. 

If the values of Pt and py are unknown, an approximation is 
obtained by taking them as /J» and /y/n. The second member of 
(13) is then replaced by — /Jy/«. A better estimate, however, is 
given by 

( 14 ) 

This may be shown by determining the expected value of the second 
member. Thus, in virtue of § 23 (5), 

and therefore, by (13) and (3), 

= n^PiPf-^PiPi = »(»- ^)PiPs 


Consequently, (14) gives an imbiased estimate of the covariance, 
since the expected value of the second member is equal to the co- 
variance. In the case of a large sample, the denominator n — 1 may 
be treated as n. 

57. Standard errors in moments about a fixed value 
It is customary to employ Greek symbols for moments of the 
population, and the corresponding Italics for those of the sample. 
Thus ftf and denote reQ>eotively the rth moment of the population 
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about the mean, and about some other specified value, while 
and m'f denote the ooiresponding moments of the sample. For the 
rth moment of the sample about a; ■> o we have 

Hence, owing to deviations Sft of the class frequencies from their 
mean values np^, we have a deviation of the rth moment from 
its mean value, given by 

ndm' s 

On squaring both sides we obtain 

where denotes summation over all integral values of i and j from 

u 

1 to 1;, except those for which t = j. Taking expected values of both 
members of this equation we deduce, since the values Zt are the 
same for each sample, 

n*E{8m')* =» S»f np<(l— p^) — 

Consequently, on dividing by n and rearranging, 

“ /*»- (SP<*S)* = Air- W. 

the moments and being those of the population. Hence the 
required formula 

= (16) 

If, however, on taking expected values as above we use the estimates 
(10') and (14), we obtain the unbiased estimate of the sampling 
variance of 

(16) 

in terms of the momoats of the sample. 
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Since the mean of a distribution is the first moment about a; » o, 
we may deduce the variance of the mean of a sample of n members 
by putting r 1 in the above. Thus (16) becomes 

= (17) 

and (16) gives, in terms of sample moments, 

1 S* a* 

fi?(«$)*~— (mS-**)=;^=i, (18) 

W— 1 71— 1 W 

in agreement with (8). 

58. Covariance of moments of different orders about a fixed value 
The calculation of the covariance of the gth and rth moments of 
the sample about the fixed value a; = 0 is similar to the above. With 
the same notation we have 

and therefore ndwi, = S^*?^** = Sa^^i. 

On multiplying corresponding sides of these equations we obtain 

t i.i 

Now take the expected value of each member of the equation. 
Then, in virtue of (9) and (13), 

n*E(Sm'„Sm') = aSaXnptPj, 

i i.} 

and consequently, on dividing by n and rearranging, 
nE(Sm',Smr) = 

“ Ai+r- (S2><4) (2i>ya5) = 

Hence the required formula 

EiSm'^Sm^) = J i/ia+r-K/^r) (19) 

§h 

in terms of the moments of the population. If, however, on taking 
expected values as above, we use the approximations (10') and (14), 
we obtain the unbiased estimate 

E{^'^ Sm'f) ~ {m'^+r - m^m;) 

7|r— 1 

in terms of moments of the sample. 


( 20 ) 
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59. Standard errors of the variance and th9 standard deviation 

of a large sample 

The mean of a sample changes with the sample. Hence we cannot 
nse (16) to obtain the sampling variance of the second moment 
about the mean of the sample. In the case of large samples we may 
obtain the required result as follows. The variance of the sample 
is connected with the second moment about a; » 0 by the usual 
relation 

«l2 “ 

Oonseqttently Sm 2 = Sm 2 — 2SSx, 

and therefore, on squaring, 

— ^SxSm^ + (i) 

where 62 = - S Sf^, Sm^ = - S (ii) 

u ^ Hi 

Suppose now that the origin of « is taken to coincide with the mean 
of the population. Then x becomes identical with 8x, for it is now the 
deviation of the mean of the sample from the mean of the popula- 
tion. If we take expected values of both members of (i) we may 
show that, when n is large, the contributions of the second and third 
terms on the right are small compared with that of the first term. 
By (16), sinoe the origin is now the mean of the population, the 
expected value of (6mJ)® is Also 2*(62)® is (62)*, and its 

expected value is of the order l/»®, in virtue of (17). Similarly, 
SSxSmf is (62)*6mg, and its expected value is of the order 1/n*^*, in 
virtue of (16) and (17). If then n is large, the expectation of the 
first term on the right of (i) is large compared with those of the 
second and third terms, and we have the approximate formula* 

= ( 21 ) 

* Fisher has shown that, for samples of any size, the exact sampling 
variance of s* (the unbiased estimate of population variance) is 

(/^4 - 3/4)/n + 2/eJ/(n - 1) (Fisher, 1929, 1, p. 206). 

For a normal population this expression has the value 2o^/(n — 1). 
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The S.X. of the rarianoe of the sample is the square root of this 
quantity. In the particular case of a normal population, with s.n. cr, 
we have =» Scr* and /t, s <r*; so that, for large samples from a 


normal population. 


i,)* = 2<7*/n. 


( 22 ) 


From (21) may be deduced the s.n. of the s.d. of large samples. 
For the variance of the sample we have, with the above notation, 
m, = 5*, and therefore 

= 28S8. 


For large samples the factor 8 may be taken as equal to the popula- 
tion parameter tr, S8 being small. Hence on squaring, and taking 
expected values of both members, we have 


= i<r^E(S8)*, 

and therefore, in virtue of (21), 

E{S8)» = f^^. (23) 


In the case of large samples from a normal population this is simply 

= (24) 

and the s.e. of the s.n. is, in this case, crl^(2n). 


60. Comparison of the standard deviations of two large samples 
For two independent large samples of % and ^2 members from 
the same population, the variance e* of the difference of their 
standard deviations, being the sum of the variances of their standard 
deviations, is in virtue of (23), 


nj 


(25) 


In particular, if the population is normal, this equation takes the 
simple form 

e«-icr*(l + l). (26) 

\% ** 2 / 

The expected value of the difference of the s.n.’s of two such 
samples is zero. Hence, by comparing the actual difference 8i—8f 
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with the 8.1). e, we may test in the usnal manner the credibility of 
the hypothesis that the samples wme drawn from the same popula- 
tion. 

For two large simple samples from different populations, whose 
moments are n^, ft, and fi, respectively, the s.b. of the difference 
of their standard deviations is given by 




(27) 


And in particular, if the populations are normal with standard 
deviations or, and <r, respectively. 


e* 


ii+fl 

2ni2n2* 


(28) 


Exarn/plt. The s.d. of a simple sample of 2,000 members is 5*9 years, and 
that of an independent sample of 2,500 members is 6*1 years. May the 
samples be reasonably regarded as from the same normal population? 

On the hypothesis that they are from the same normal population, an 
approximate value for its s.d. is 6 years, and the s.b. of the difference of the 
standard deviations of samples of the above sizes is, by (26), 

« = 6V(?A7F+ri?inF) = 6(0*0212) = 0*127 year. 

The actual difference of 0*2 year is less than l*6e, and is therefore not signi- 
ficant at the 5 % level of probability. There is thus no real evidence against 
the hypothesis. 


Sampling from a Bivabiatb Population 

61. Sampling covariance of the means of the variables 
Suppose now that the population is bivariate, with as relative 
frequency of the pair of values (x<,j/^), or of the class with centre 
at that point. The means of x and y in the population are then 

ft = /t' =* 'ZfiVt, 

i i 

and the moments of orders q and r in a; and y respectively are 

K.r “ “A)® (Vi -/*7, 

the first about the point (0, 0), and the second about the mean of 
the population. In particular fij^^ is the covariance of x and y in 
the population. 
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In a simple sample of n pairs of values, let be the frequency of 
the »th class. Then the values of the moments of the sample ate 
given by 

the covariance of x and y in the sample being The argument 
of §§ 65 and 66 still holds, and the formulae 

» n^iil -pf), E{8fiSff) = -npiPi 

are still valid. 

Corresponding to (17) we may now prove that the eovariaruse of 
the means x, y, in samples from a bivariate population, is Jn. For 

Hence the deviations Sx, Sy from ji, fi' respectively, and the devia- 
tions Sfi from their mean values np^, are connected by 

nSy^J^ytSfi, 

and therefore on multiplication 
n*SxSy » 

i u 

Taking the expected value of each member we deduce 
n*E{8xSy) = Sx<y,npi(l -p*) - 
and therefore 

nE^mSy) = iliPiVi) 

i i i 

Hence the required result 

EiJSxSy) = (29) 

From tbia it follows immediately that the coefficient of correlation 
between S and y is equal to the correlation p between x and y in the 
population. For the correlation between x and y is 

covariance of x and y ^ /tut ^ „ 

(s.l!. ofS)(s.B. of y) “ (ri<r, ' 

(tJ a nd <rj being tiie variances of x and y in the population. 
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We may also prove that the expected value of the covariance of 
X and y in the sample is given by 

= (30) 

which corresponds to (6). For, by the known relation, § 23 (6), 

»»i.i = ^ 

Ih 

Taking the expected value of each member we have, in virtue of (29), 




as required. 


(-3 


/* 1.1 


62. Variance and covariance of moments about a fixed point 
The moment of the sample about the fixed point (0, 0), of order 
g in a; and r in y, is given by 

and therefore, with the usual notation, 

n3m^r = 'LAfi^U 

Squaringboth sides, and taking expected values as in § 57 , we deduce 
= n/ag,ir-n/i2r> 

and therefore 

(31) 

in terms of the moments of the population. This is the sampling 
variance of If, in taking expected values, we use the approxi- 
mations (10') and (14), we obtain the unbiased estimate 

in terms of the moments of the sample. 


( 32 ) 
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Similarly, we may 6nd the sampling covariance of the moments 
and about (0, 0), by multiplying together the expressions 
for and dmj, | and taking expected values of both sides. Pro- 
ceeding as above we obtain the result 

^ (33) 

In particular by putting r = a = 0 and j = t = 1, and observing 
that = ft, s ft', we find again the covariance of the means 

E{SxSy) = 

As in other oases we may deduce the unbiased estimate corre- 
sponding to (33), with sample moments and denominator n - 1. 

63. Standard error of the covariance of a large sample 

By argument similar to that of § 59 we may find the s.z. of the 
covariance of a large sample from a bivariate population. For, with 
the usual notation, since 

♦»i.i = 

we have = Sm\^i—ySx—xSy. 

If now we take the origin at the mean of the population, z becomes 
identical with Sx and y with Sfy. Consequently, if we retain only the 
principal terms we have, on squaring the last equation and taking 
expected values, 

and, since the origin is at the mean of the population, this is, in 
virtue of (31), 

(34) 

giving the square of the s.e. of the covariance of the sample. 

64. Standard error of the coefiBcient of correlation 

The S.E. of the coefficient of correlation, r, for large samples of 
n pairs from a bivariate population, may be foimd as follows.'*' 

• Cf. Bowley, 1920, 2, pp. 422-3. 
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Omitting tiie comma between the two subsoriptB in the moment 
symbols we have, by the usual fonnula, 

and therefore, by logarithmic differentiation, 

Sr ^ Smii 1 Snisf, 1 ... 

r ~ will 2 wijo 2 »»o, * 

If the origin is taken at the mean of the population, so that S and 
p are identical with Sx and Sy, and we retain only small quantities 
of the first order, we find as in the preceding section 

Smjx = 

n t 

Similarly, firom the relations 

»» 2 o = ♦»o* = wtoj-y®. 

we find, on retaining only the principal terms, 

^^20 ~ Sm^ = ~ S ^ftt 

Sm„ ^ Sm'oa 

Now substitute these values in (i). Then, since n is large and we are 
retaining only small quantities of the first order, we may replace 
the denominators in (i) by the moments and correlation coefficient 
p for. the population. Thus 

P 2;tjo n 

“ J!,F{xoyi)Sftln, 

where F{Xf,yi) denotes the oxprestion in brackets. Squaring both 
sides we have 


(?)* “ ^« ? Vi) S; F{xt. y,) F(xj, yt) Sf^ ttf„ 

and therefore, on taking espected values, 

^ S F*(xt, yt)pt(l -pt) - S' F(xt, yt) F(Xf, 
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Now the second sum on the right is equal to zero. For 

On substituting the value of F*{Xf,yf) we deduce the sampling 
variance of r in the form 


E(Sr )* . 


fe. i*40 . 

^31 

fha 1 fha ”1 

L»!l ^20 4/toj 

/»20/‘u 



(36) 


This result assumes a very simple form in the case of a bivariate 
normal population. The parameter values are then 


/t„*(l + 2p»)o-f<rS, Aii=PO‘i®’ 2 . A*o = o’*. = 

A 40 “* /*04 ~ 3oj, fifi = Spafcr^, = Zpa‘iO\, 

and on substituting these values we obtain 


Consequently 




8.E. of r 


l~p* 


(36) 


These formulae hold for large values of n. Their application, 
however, is limited by the fact that the sampling distribution of r 
is not even approximately normal when r is fairly large. The s.E. 
test for r should therefore be applied only for large samples and 
moderate values of r, A much more useful test, which is applicable 
to small or large samples, will be considered in Chapter x. 
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EXAMPLES vn 

1 . In a sample of 10,000 from a normal population the s.d. is 
2*62 cm. Show that the s.B. of this quantity is 0*018 cm. nearly, and 
hence that the s.d. of the population almost certainly lies between 
2*46 and 2*68 cm. 

2. A random sample of 6,400 members from a certain population 
has a S.D. of 6* 80 years, and a fourth moment of 3298 yr.* Show that 
three times the s.e. of the s.d. is 0*094 nearly, and hence that the 
S.D. of the population almost certaiuly lies between 6*7 and 6*9 years. 

3. The S.D. of a random sample of 1,000 members is 6*9 years, 
and that of an independent sample of 900 members is 6*1 years. 
Show that the samples may reasonably be regarded as drawn from 
equally variable normal populations, since the difference of their 
standard deviations is very little greater than its S.B. 

4. Standard error of the coefficient of varicUion. By definition this 

coefficient is F = S/x = Logarithmic differentiation gives 

6F _ Sx 
V ~ 2mj X * 

Squaring both sides and taking expected values show that, for 
large samples 

It can be shown that, for samples from a normal population, 
E{SxSm^ s 0. Show that in this case the above result leads to 

S.B. of F = F V[(l + 2F*)/2»]. 

5. Standard error of a moment about the mean of a large sample. 
Proceeding as in § 69 we have 

nmj = S (xt-xffi 

+ * • *. 

dm^ “ im'— 


«nd therefore 
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If the origin is taken at the mean of the population, x becomes 
identical with Sx; and, as we need retain only the terms of lowest 
order, we neglect those after the first two. On squaring we may 
write the result 

and on taking expected values of both sides we obtain, in virtue 
of (15) and (19), the origin being the mean of the population, 

The square root of this quantity is the s.b. of 

6 . Deduce that, for large samples from a normal population of 
s.D. cr, the s.e.’s of m 3 andm 4 arecr® ^(6/ri) ando^-^(96/n) respectively, 

7. A random sample of 3,600 pairs of values from a bivariate 
normal population showed a correlation coefficient of 0*45. Find 
Umits to the correlation in the population. 

Since the sample is large we may take 0*45 as an approximate 
value for p. The s.b. of r is then (1 — 0*2025)/60 = 0*0133, so that 
Se = 0*04 nearly. The coefficient of correlation in the population 
almost certainly lies between 0*41 and 0*49. 

8 . A random sample of 2,500 pairs of values from a bivariate 
normal population showed a correlation of 0 * 1 . Is this really signifi- 
cant of correlation in the population? 

On the assumption that the population is uncorrelated we have 
the s.E. of r = 1/50 ~ 0*02. The actual value is 5 times this s.E., 
and is therefore significant. 

Show that the above correlation of 0*1 would not be significant 
in a sample of less than 400 pairs. 

9 . By means of the result of Examples vi, 18 (p. 129) deduce* 
the formulae (15) of §57 and (19) of §58. 

♦ Cf. Kendall, 1943, 2, pp. 206-6. 
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BETA AND GAMMA DISTRIBUTIONS 


65. Beta and Gamma Functlone 

For the benefit of the student who is not familiar with them, we 
shall first prove the elementary properties of the Beta and Gamma 
functions. The integral 

( 1 ) 

converges if n is positive. It is a function of n called the Oamma 
Function. Clearly 

/’(I) = = 1. (2) 

Also, if n - 1 is positive, we have on integration by parts 


so that 


/’(n) = +J (n- l)a;'‘“*e“*da;, 

r(n)*(»-l)r(n-l). (3) 


Hence, if » is a positive integer, 

r(n) = (n-l)(»-2)...2.1.r(l) = (n-1)!. (4) 


On account of the property expressed by (3) and (4), /'(n) is often 
denoted by (n- 1)1 whether n is integral or not. Also, on writing 
in place of a; in (1), we have the alternative formula 


r(n) as 2 J a;**~*exp(-**)da:. 


( 6 ) 


And, by an obvious substitution, it is easily verified that, if a is 
positive, 



e~^x'^~^dx — o~* /'(»). 


(«) 
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The integral 


Beta FtmcUon 


U7 


B{m,n) = J **“^(1 —x)^^dx 


also converges if m and n are positive. It is a function of m and n 
called the Beta Function. That it is symmetrical in m and n is 
easily shown by the substitution s » 1 — Then (7) becomes 

B(m,n) = J = B(n,m). 

Further, on substituting x = sin^d in (7) we obtain 

r*" 

B(tn,n) = 2 I sin*®-^d co8**~^did, (8) 

and therefore in particular 

B(i,i) = 2j*^de = n, (9) 

while from (7) it is obvious that 

. 8 ( 1 , 1 ) = 1 . ( 10 ) 

Lastly the substitution x = 1/(1 +y) in (7) leads to an important 
alternative definition, viz. 

and in this integral m and n may be interchanged, in virtue of the 
symmetry of the function. 

Example. Show that 

This may be deduced from (11) by dividing the range of integration into two 
parts, 0 to 1 and 1 to oo, and putting y = l/o? in the integration over the 
second part. 


66. Relation between the two functions 
That the Beta and Gamma functions are connected by the relation 

= r{fn~+n) 

may be proved as follows. Consider the integrals 


Jj — 2 J af*"~^exp ( — x*)<i», /* * 2 y*"~*exp (— y*)dy. 



148 B and F Distributions [vni 

whose limiting ralnes as a tends to infinity are r{m) and /'(n) 
respectively, in virtue of (5). Then 

SB 4 I da: I exp(— »*— y®)a:*™~^®’‘“^dy, 

Jo Jo 

or, on changing to polar coordinates. 


kk 



r») r*»*+a»-i coB*»*-i d sin*»-i 6 drdd. 


the integration extending over the square OACB (see Fig. 4, p. 66). 
Since the integrand is positive, this integral is intermediate in value 
between the integrals of the same function extended over the 
quadrant OAB, of radius a, and the quadrant OPQ, of radius a^2. 
Hence JiJ^ lies in value between 

4 I* oos®"*~^ 6 sin®“~^ ddd f exp ( — r®) r dr, 

Jo Jo 

and the corresponding integral with 0 and a ^2 as the limits for r. 
But, as a tends to infinity, each of these integrals tends to the limit 
B{m,n)r{m+n), while and tend to the limits r{m) and J'(n) 
respectively.' Consequently 

B(m,n)r(m+n) = r(m)r(n) 

as stated in (12). 

Putting m Bs n =B ^ in this result we have, in virtue of (2) and (9), 
Consequently r(J) = ^ (13) 

roog— * 

or 

By writing x® in place of x in this integral, or by putting n = | in 
(6), we deduce 

f exp(— x®)dx =s ^.^77. (14) 


Sbeamph 1. Show that, if n» is a positive integer. 


r(m+i) 


2m— 1 2m— 8 

~2 r~ 


... 
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Gamma Distnhvtion 


Example 2. Show that, if m and 
B(tn,n) = 


n are positive integers, 

(ni-l)!(n-l)! 
(m+n— 1)1 


In particular, B(l,n) = 1/n. 

Example 3. Show that 

B(m+ l,n)/B(ni,n) = m/(m+n) 
and B(w+2,n)/B(m,n) = m(m+l)/(w+n)(m + n+l). 

Example 4. Show that 

B(m+2,n-2)/B(m,n) = m(m+ l)/(n-l)(n-2). 
Example 5. Show that 

J (o — dx = B(m, n). 
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67. Gamma distribution and Gamma variates 
In virtue of (1) a continuous variable x, which is distributed 
with probability density 

«iw - m 

throughout the range 0 to oo, is called a Oamma variate unth para- 
meter l\ and its distribution is a Oamma dietribviion.* The factor 
1/J'(Z) ensures that the integral of ^[x) over the whole range of 
values of x is unity. The reader should sketch the probability curve 
y <= <j>(x) for the distribution. He will find that it is asymptotic to 
the x-axis and that, if 1 > 1, it has a mode at x = {—1. IfZ>2italso 
touches the x-axis at the origin; while, if 1 < { < 2, it is tangent to the 
y-axis at that point. If, however, 0 < I < 1, the curve is asymptotic 
to both axes. 

The expected vcdw of the variate in the distribution is given by 

The second moment about x = 0 is similarly 

^ = rxV(*)d® = r{l+2)ir(l) = l{l+ 1). 


* This belongs to Karl Pearson’s Type in. SeeKendaU, 1943, 2,pp. 137-43. 
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Hdnoe the varmnce is given by 

<r«»l(l+l)-l*=.l. (17) 

Example. Show that the rth moment about a = 0 is 
/i; = V+l)(H2)...<l+r-l), 
and deduce that tiie third moment about the mean is 21. 


An important example of a Gamma variate is associated with the 
normal distribution. H a; is normally distributed with mean a and 
S.D. <r, the probability that a random value of the variate will fall 
in the interval doc is 

dP » ^;^^exp [- (a!-o)®/2<7*](fe. (18) 

Let u be defined by 

u = \{x-a)*}a*, 

so that, as x varies from — oo to + oo, u varies from + oo to 0 and then 
from 0 to + 00. For values of x between a and + oo. 


and 
so that 


x—a = cr^(2tt) 
dx <rdu/^(2tt). 


2 ^ 


But the probability that u falls in the interval dv, is double this, 
since there is an equal probability that u will fall in this interval 
when X lies between — oo and a. Consequently the probability 
differential for the variate u is 

e-^u^^du 

so that « is a Gamma variate with parameter i. We may therefore 
state 

Theorem I. Ifx is normally dislrUyuted mth mean a and standard 
devuOion or, then ^{x—a)*/<r* is a Oamma vaHate with paramet&r 

A Gamma variate with parameter I may be referred to briefly as 
a 7 ( 1 ) variate, the symbol being used adjectively.* 


* The objection to describing it os a r(l) variate is the additional meaning 
thus givan to the symbol 
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68 ] Sum of Gamma Variatea 

68. Sum of Independent Gamma rariates 
The moments of the Gamma distribution, and the distribution 
of the sum of independent Gamma variates, may.be deduced from 
the moment generating function. The m.g.f. of a y{l) variate with 
respect to the origin is given by 


M(t) 




0 T(!r Jo ~r(zr" (T^ 


(1*1 <i). 

and the cumulative function is therefore 


( 19 ) 


.^(O^-nogci-f) 

« l(t + fi/2 + 1 ®/ 3 + «*/4 + . . .). ( 20 ) 

Thus the mean of the distribution is I, and the variance also I, 
while the other cumulants are 


IT, s=2!l, #r 4 = 3!l, if,= (r— 1)!1. (21) 

Suppose now that x and y are independent Gamma variates with 
parameters I and m respectively. Then the m.g.f. of their sum, being 
equal to the product of their m.g.f.’s, is (1 — But this is the 
m.g.f. of a y{l+m) variate. We thus have 

Teeobem n. The sum cf two independent Ocmma variatea, wUh 
parameters I and m, is a Qamma variate with parameter l+m. 

The converse of this theorem is almost equally important. It 
may be stated: 

Theorem III. If the atm of two independent positive variates is a 
Oamma variate with parameter l+m, and one of them is a Oamma 
variate with paramet&r I, then the other is a Qamma variate with 
parameter m. 

For, if M(t) denotes the m.g.f. of the last variate, we have, on 
equating the m.g.f. of the sum to the product of those of its com- 
ponents, 

(1 _e)-S+m) = (1 ^t)-*M{t), 
whence M{t) = (!—<)-”, 

and the second component is therefore a y(m) variate. 
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On aooonnt of the importance of these theorems we shall prove 
Theorem 11 £rom first principles. Let x and y be independent yif) 
and y{fn) variates, and ]atz x+y. Suppose first that y has a fixed 
value in the interval dy. Then dz = dx. The probability that a random 
value of X will fall in the interval dx is 


dp = er^aM-dxjriy), 

and therefore the probability that, for a given value of y, z will He 
in the interval dz is 

dp = {z—yy-^dzir{l). 

But the chance that y will have a value in the interval dy is 
dp' = e~^y”^-^dyir(m). 

The probability that simultaneously z will He in the interval dz 
and y in the interval dy is the product dpdp'. By integration we 
then have the probability that, for any value of y, z wiU He in the 
interval dz as 

To evaluate the integral put y = zf, so that dy = zdt. Since z is the 
sum of the positive variates x and y, it is never less than y, so that 
t lies within the range 0 to 1. Cionsequently 


dP = 


e~*2'+"*~^dz 

~mT^ 





” mTW) 




r(l+m) ’ 


and z is therefore a y(l + m) variate as stated. 

Repeated appHcation of this theorem shows that the sum of n 
independent Gamma variates with parameters mf{i = 1, 2, ...,n) is 
a Gamma variate with parameter In particular, in virtue of 
Theorem I, we may state 

Theorem IV. If x^ (» a 1,2, ...,n) are n independent variates, 
normally distributed about a common mean zero ivith standard devia- 
tiona (Ti, and s then is a Oanma variate with para- 

i 

meter |n. 
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Bda Distribtition 

69. Beta distribution of the first kind 


In virtue of the first definition, (7), of the Beta function we shall 
say that a continuous variate x, which is distributed with prob* 


ability density 


^(») 




( 22 ) 


throughout the range of values 0 to 1, is a Beta mricUe of like first 
kind with parameters I and m; and its distribution is a Beta distribu- 
tion of ike first kind.* Such a variate may be referred to briefiy as a 
m) variate. The reader should sketch the probability curve for 
the distribution, distinguishing the different cases. He will find that, 
if I and m are each greater than 1, there is a modal value 


If Z > 2 the curve touches the x-axis at the origin; while, if 1 < Z < 2, 
it is tangent to the y-axis at that point. If, however, 0 < Z < 1, the 
curve is asymptotic to the y-axis. Similar remarks hold for the 
shape of the curve near x = 1, according to the value of m. 

The mean value of x is given by 


FM- -5(^ + 1, m) 

5(Z,m) " B{l,m) 

The second moment /tg about x = 0 is similarly 


Z 

Z + m* 


(23) 


, B(Z+2,m)_ 1(1+1) 

^ B(l,m) (Z+m)(Z+m+l)* 


From these it follows that the variance is 


3 t9A,\ 

^ ~ (Z+m)*(Z+iiH-l)* ' 

Corresponding to Theorem II there is a fundamental theorem 
which may be stated: 

Theoebm V. If X and y are independent Gamma variates mtk 
parameters I and m respectively, the quotient x/(x + y) is a Beta variate 
of the first kind with parameters I and m. 

This may be proved from first prindples. If we write 

z = — then X = , 

x+y’ 1-z 

* This belongs to Karl Pearson’s Type L 



154 B and r Distributions [vni 

and sinoe 'x amd’tf ate both positive, the range of « is from 0 to 1. 
Suppose first that y has a fixed value in the interval dy. Then 

dx = ydzl(l — *)•. 

The probability that, for this value of y, x will lie in the interval dx 
and therefore z in the interval da is 


dp 


er’Sjff-i-dx _ I t yz / yz \ ydz 


But the chance that y will have a value in the interval dy is 


dp' = 


er^y^^dy 


The probability that simultaneously z will lie in the interval dz and 
y in the interval dy is the product dp dp\ By integration we then find 
the probability that, for any value of y^ z will lie in the interval dz as 

To evaluate the integral put y =: (1 — z)i. Then the limits for t are 
0 and 00 , and we obtain 

* :f~^l — z)”'~^r{l+m)dz z^^(l— z)"*~^dz 

r[l)r(m) “ B{l,m) ’ 

showing that z is a /3i{l, m) variate as stated. 

We may remark in passing that, if z is a variate and 

V a 1 — z, then v is a fiiim, 1) variate. For the range of v is also from 
0 to 1, and I dv I s | dz |. Expressing the probability differential for 
z in terms of v and dv, we obtain 


urhich shows that v is a 1) variate. And the reader will observe 

that dP depends on the magnitude | dv | of the interval dv, but not 
upon the sign of dv/dz. 


70. Alternative proof of theorems 
A combined proof of Theorems n and V may be given more 
simply as follows.* As before let x and y be independent y(l) and 


* This proof is given by Sawldns, 1940, 3, p. 212. 
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y(iii) vftnates respootively. Then the probability that' a random 
value of X will fall in the interval da: and» at the same timOi a random 
value of y will fall in the interval dy is 

Now introduce the new Toriables 


tt = »+.y, v^xl{x+y) 

so that x = uv, y — u(l — v). 

Then as x and y range from 0 to oo, « ranges from 0 to oo and v from 
0 to 1. Also 

8(«.y) 

d{u,v) 

From the above expression for the probability differential dP it 
follows that the probability density in the joint distribution of 
X and y is 

^-ce-Vgt-Xym—l J _ fjyn-X 

r{i)r{m) “ ‘ 


But the area of the element of the xy-plane boimded by the curves 
along which u has the values u and u+du respectively, and those 
along which v has the values v and v+dv is 


0(®.y) 


d{u,v) 


dudv = ududv. 


Hence the probability that, in a random selection of x and y, the 
representative point (x, y) will fall in this element of area is 


r(l+m) • BM 


(26) 


Since tbiii is the probability that simultaneously u will fall in the 
interval du and v in the interval dv it follows that these variates 
are independent, and that « is a y(I+fn) variate and v a 
variate, as stated in the above theorems. 
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71. Product of.a m) variate and a yCl+nt) variate 
We may now prove an important property, suggested by Theorem 
V, which may be stated: 

Thbobem VI. The product of a m) variate and an independent 

y(l + m) variate is a y(l) variate. 

Let V be the variate and u the ytf+m) variate, and let 

z s uv. The probability differential for v is 


dp 


B{1, m) 


(26) 


Hence, for a fixed value of u in the interval du, the probability that 
z will fall in the interval dz is 


dp 


1 

Bif, m) 




u 


Multiply this by the probability that a random value of u will fall 
in the interval du, and integrate over the range of u from z to oo 
(since z<u). Thus we have the probability that, for any value of 
u, z will lie in the interval dz as 


dP = -HTT— 77^7 ; f (l — 

B(l,m)r(l + m)Jg \ u/ 

^ fir r® 




To evaluate the integral put u—z = zt. Then t ranges from 0 to oo, 
and we have for the probability differential of z, 


dP = 


c“*z*-^dz r* 

/WwJo 


e-^z'H'^-^dt = 


e-*z*-idz 

m ' 


Consequently z is a y(l) variate as stated. 


72. Beta distribution of the second kind 
In agreement with the alternative definition of B(m, n) contained 
in (1 1) we may define a Beta variate of ihe second kind, with positive 
parameters I and m, as a continuous variate x which is distributed with 
probability density 

“ H(l,m) (!+*)'+« 

throughout the range a; = Otoa; — oo. Such a variate may be referred 
to briefly as a y9,({, m) variate. Its distribution is a Beta distribution 
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of the second hind.* It is important to remember that, while the 
values of a Beta variate of the first kind are from 0 to 1, those of a 
Beta variate of the second kind are from 0 to oo. The reader should 
sketch the different forms of the probability curve for the latter 
distribution. He will observe a considerable resemblance to the 
case of a Gamma distribution. Ifl > 1 there is a mode at 


* = (l-l)/(m + l). (28) 


The curve is asymptotic to the a;-axis; and, if 1 > 2, it touches this 
axis also at the origin. If 1 < Z < 2 the curve touches the y-axis at 
the origin; and if 0 < Z < 1 the curve is asymptotic to both axes. 
When m > 1 the mean value of the variate is given by 


E(x) = ^ r ^ ^ 

^ ' JS(Z,m)Jo (1+a:)*'*^ B(l,m) m — 1* 


(29) 


Also, if m > 2, the second moment about x = 0 is 


' _ 1 I 

^*“B(Z,m)J 


•» x'+ieZx B(Z+2,m-2) 


Z(Z + 1) 


lo (l+x)'+”* 
Consequently the variance is 
Z(Z+1) 


B{l,m) (m— l)(m— 2)* 


cr* = 


Z* 


Z(Z + m — 1) 


(to— l)(m — 2) (to— 1)* ~ (to— 1)*(to — 2)* 


(30) 


We may observe that, if z is a Beta variate of the second kind, 
its reciprocal is a variate of the same kind with parameters inter- 
changed. For, if » is a PJf,m) variate, its probability density is 
given by (27). If then we put x = 1/y, we have \ dae\ = \dy 
and therefore, since y lies in the interval dy when x lies in the 
interval dx, the probability of this is 

jp ^ (Vyy~^(dyly*) ^ yr-^dy 

B(Z,TO)(l-f-l/y)'+™ B(Z,TO)(l+y)'+**’ 


Consequently j/ is a Pi{m, 1) variate. 


* This belongs to Karl Pearson’s Type VI. 
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An impOTtant xdation between the two kinds of Beta variates 
is ezinessed by 

Thbossm VH. To each Beta variate of the first Mni eorrespomds a 
pair of Beta variates of the second kind; and conversely. 

Let V be a fiyff, m) variate, so that its probability differmtial is 


dP 

and let to be defined by 


_ —v)”*-^dv 

“ 5(1, m) ’ 

V = 1/(1 +w) 


(31) 

(32) 


or its equivalent to = {l—v)Jv. 


Then \dv \ » \ dw |/(1 + w)‘, and by substitution we have the prob- 
ability difierential of w as 


, _ dw 

~ 5(1, m)(l +»)*+"•* 


(33) 


Sinoe the range of to is from 0 to oo, it follows that w is a 1) 
variate. Its redprooal is therefore a m) variate, and the first 
part of the theorem is proved. 

Conversely, given that to is a variate with probability 

differential (33), we may define a variate v by means of (32). Sub- 
stitution then shows that the probability differential of v is given by 
(31); and since v ranges from 0 to 1, it is a fii{l,tn) variate. The 
variable 1 — 1 > is therefore a fiiim, 1) variate, and the second part of 
the theorem is proved. 


73. Quotient of Independent Gramma variates 

Corresponding to Theorem V, according to which a Beta variate 
of rile first kind is determined by two independent Gamma variates, 
wq have 

Thbobbm vm. The tpootieni of two independent Oamma variates, 
toi^ parameters 1 and m,isa m) variate. 

Let X and y be independent Gamma variates with parameters 1 
and in respectively, and let V » y/(y-|-a;). Then, in virtue of Theorem 
V, V is a 1) variate. But, if to is the quotient xjy, clearly 

1 1 
l+a:/y " l+io’ 

so that to is a fi^l, m) variate as stated. 
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The theorem may also be proved by the method of § 70, whidi 
leads to the additional information that the sum x +y is distributed 
independently of the quotient x/y. For if 

« = x+y, e-'x/y 
d{u,v) ^ x+y ^ (l+t>)* 

y* “ « ’ 

|8(».y)l_ 


we have 


and therefore 


d(u,v) 


(1+vr 


Since the probability that x and y fall simultaneously in the 

intervals dx and dy is . , . , . 

af-^y^hr^dxdjf 


dp = 


r(i)r(m) 


the probability density for their joint distribution is 

r{i) r(m) r(i) r{m) (i + »)*+^** 

Consequently the probability that, in a random choice of x and y, 
the representative point (x,y) will fall in the area bounded by the 
curves «,»+ », t> + dt; is 

_ f^^dv 

~ r{l+m) * 5(1, m) (!+»)'+*"* 

Since this is the probability that u and v will fall simultaneously in 
the intervals du and dv resx>eotively, it follows that u is a y({+m) 
variate, and v a /?t(l,m) variate, and also that these variates are 
independent. 


Example 1. Oauchy'e diatribution. The distribution of the quotient of two 
independent standard normal variates follows from the above theorem. For 
if z is this quotient, z* is the quotient of two independent Ganuna variates, 
each of parameter f. Ck)nsequentl7 z* is a fiJH, i) variate, with probability 
differential (z*)*-» (i(z») d(z*) 


dP 




The distribution of z follows immediately from this. For, since the range of 
z is from oo to + oo while that of s* is from 0 to + oo, the probability differ- 
ential of s is 

the factor 2 disappearing, since the integral from — oo to + oo must be equal 
to unity. This distribution is associated with the name of Cauchy. 
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Example 2. Shpw that, if x and y are independent normal variates with 
means nij, and variances a\,a% respectively, the quotient 

* = 

conforms to the distribution 

^-7r(ai+oiz*y 

with a range — oo to + oo. 
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EXAMPLES Vin 

1 . We shall see later (cf. § 82) that, if r is the coefficient of corre- 
lation between the variates in a random sample of n pairs of values 
from an uncorrelated bivariate normal population, then is a 

— 2)) variate. Hence show that the mean value of r® for 
such samples is l/(»— 1), and the s.b. of r therefore 1/V(»— !)• Also 
show that the probability differential of r is 

_ ( 1 — 

H(i,i(»-2))* 

2. Quotient of independent Oamma variates. Theorem VIII may 
be proved by direct integration, as in the caees of Theorems II, V 
and VI. Let z = xjy, where x and y are independent y{l) and y(m) 
variates. Then for a fixed value of y in the interval dy, dx = ydz, 
and the probability that z will lie in the interval dz is 

dp = e-if^iyzf y dzjrif). 

Consequently the probability that, for any value of y, z will lie in 
the interval dz is 

dP s= z*'® dz J dyir{l) r(m) 

zf~^dx 

showing that z is a m) variate as required. 
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3. Show that, for the y{l) distribution, 

(mesa— mode)/(r *» If^jl = i/ija*. 

Hence some writers prefer to /ijo* as a definition of skewness. 

Show that the excess of kurtosis of the distribution is 6/1. 

4. Show that the mean value of the positive square root of a 
y{l) variate is r{l + Hence prove that the mean deviation of 
a normal variate from its mean is (r.^(2/7r). 

5. Prove that the mean value of the positive square root of a 

variate is r(l+^)r(l + m)ir(l)r(l+m+^). Hence, using 
the distribution of given in Ex. 1, for samples from an uncorre- 
lated bivariate normal population, show that the mean value of 

lr| is/’(i(»-l))/r(^n». 

6. A simple sample of n values is drawn from a population with a 
y{l) distribution. Show that, if x is the mean of the sample, tw is 
a y{nl) variate; and deduce that E{x) — I, and that the sampling 
variance of « is l/n. 

7. A simple sample of n values is drawn from a population with 
the exponential distribution whose probability density is ae~^, 
(O^x). Show that, if x is the mean of the sample, nox is a y(n) 
variate; and deduce that j^(x) = 1/a, and that thes.B. ofxis l/(a^n). 

8. Show that, if v is the square of a y(l) variate, its probability 
differentiai is dp = v^~^€r‘^*dvl2r{l). 

9. Show that, if x^(t = 1, ...,n) are a independentGamma variates 
with parameters m^, any homogeneous function ^(x^, ...,x„) of 
these variates, of degree 0, is distributed independently of the sum 
Sx<. (Of. Pitman, 1937, 8, pp. 216-17.) 

i 

This may be done by considering the m.g.f. of the simultaneous 
distribution of F and Xf. As explained in § 31 this m.g.f. 

F{exp{ti^Xf+t^F)) 

= c{ ...f exp(— 

Jo Jo 

X exp {tiJ^Xt+t^F) dxi 
1/C = f(fnj) ... 7'(wi„). 


where 
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SubstdtQtiiig 97^(1 ~ Vi we have 

X exp (<, J'(yi, .. ., y«)) • . • <iyn. 

sinoe F is homogeneous of degree zero. Thus 

^t) ^ (function of ^i) (function of { 2)9 
and the variates F and therefore independent. 

10. Given that the incomplete Beta function is defined by 

BJJL.m)— I* x^“^(l-a;)*"-“^daj, 

and that IJJi, m) = m)IB(l, m), prove the relations 
Ixih m) = l-/i.^(w, 1) 

asxd 1^(1+ 1, m4'l) = aj/jp(Z, m + l)4-(l-a;) /^.(Z+l, m). 

11. SimvManeons sampling distribution of the mean and the 

varianee. The independence of the sampling distributions of the 
mean x and the variance 8^^ of a random sample of n values from a 
normal population, was proved by Fisher from the simultaneous 
sampling distribution of these statistics. The probability that the 
n values of the sample will fall in the respective intervals ...» dx^ 

is, by the theorem of compound probability, 

dp « (cr^(2/r))“^exp — S (^i— dx^^dx^,.. dx^^ 

the n values being chosen independently from the normal population 
of mean pt and variance cr*. But 

S(a;<-;e)* =* * n/S*4-n(x-/t)*, 

so that 

dp « (cr-^(2;r))*»*exp [— n(«*-/^)*/2<r*]exp ( — nS*/ 2 < 72 )d«j ... dx^. 
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Fisher proved by geometrical reasoning* that this probability 
differential is expressible as 

dp SB <7exp[— n(»— /t)*/2<r*]d»exp(— ni8*/2(r*)iS"“*diS, 

where C is a constant. Since this is of the form 

dp = ^j{x)dx.^2(^)d8, 

the distributions of x and 8 are independent. The forms of 
and show that x is normally distributed, with mean /i and 
variance tr^fn, and that In8^/(T* is a Gamma variate with para- 
meter ^(n — 1). 

Another proof of the independence of 3 and 8 * will be given in §77. 

12. Show that the rth moment of a distribution about 

a? = 0 is 

(1 +m) (l-HTn+r— 1) 

13. Show that, if r is less than m, the rth moment of a 
distribution about x » o is 

l(l-H)...(l-t-r-l) 

(m — l)(m— 2)... (m— r)’ 

14. Defining the harmonic mean (h.h.) of a variate x as the 

reciprocal of the expected value of 1/x show that, if 1 > 1, the h.m. 
of a y{l) variate is 1 — 1 , that of a m) variate is(Z— 1)/(Z-Hm— 1), 
and that of a variate is (Z— l)/m, m being positive. 

* ViBber, 1925, 1, pp. 92-S. 


11*3 
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CHI-SQUARE AND SOME APPLICATIONS 

74. Chi-square and its distribution 

We have already seen (§ 68, Theorem IV) that if (♦ = 1, 2, ...,n) 
are n independent variates normally distributed about a common 
mean zero with standard deviations cr^, and 

= ( 1 ) 

i 

then is a Gamma variate with parameter in. The distribution 
of is therefore 

which expresses the probability that the value of x* found from a 
random sample will fall in the interval dx^. This distribution is 
often referred to as the distribution, and a variate conforming to 
it is said to be^distributed like x^- Since x^ is twice a y(|n) variate, 
its mean value is n and its modal value n— 2, as proved in §67. 
Similarly, the variance of is 2n. 

The distribution (2) was discovered by Helmert in 1875, and 
rediscovered independently in 1900 by Karl Pearson, who devised 
by means of it the x‘ i>sst of ‘goodness of fit’ which will soon be 
considered. We must first, however, examine the effect of one or 
more linear relations between the variates and, to make this 
dekrer, we shall digress briefiy to remind the reader of the pro- 
perties of an orthogonal linear transformation. 

75. Orthogonal Unear transformation 

Let the n variates Zf be subjected to the linear transformation 

»). (3) 

If the constant coefficients are such that 

SfiJ-S*! (4) 
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the transformation is said to be orthogonal. In this case the ooefS* 
cients satisfy the relations 

l = (6) 

and S CtjCit. = 0 =» 2 0’ + *)• (®) 

These relations may be expressed verbally by saying that, in the 
determinant | | of the coefficients, the sum of the squares of the 

elements in any row or in any column is equal to unity, while the 
sum of the products of corresponding elements in any two rows or 
in any two columns is equal to zero. The determinant of the coeffi- 
cients is equal to ± 1 ; and consequently the Jacobian of the f’s 
with respect to the x'b is also ± 1. By changing the sign of one of 
the £ ’s, if necessary, we may ensure that the value is + 1 ; and we 
shall assume in all cases that this has been done. It follows that 


8(a;t,...,g J _ 

g«) ■ 


(7) 


It is easy to prove that, if the x’b are statistically independent 
variates, normally distributed about zero with unit s.d., so are 
the £’s. For, the probability that simultaneously the n values of 
the variates Xf will fall in the respective intervals dXf is the product 
of the probabilities for the individual variates, and is therefore 
given by 

dP = (271)“** exp ( — J 2 *i) dxidx^ . . . dx^. 

i 

The probability density for the joint distribution of the variates 
Xf is therefore ' 

(277)-** exp (- i 2*1) = (27r)-**exp ( - i 2 gf). 


But the 'volume’ of the element bounded by the n pairs of hyx)er- 
surfaces “ 


0(*i *,) 


dSidi,...di„ 


d^idSg...d^ 
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80 that the probability that the ^’s will fall rimultaneously in their 
respeotiTe intervals is expressible as 

dP » ( 2 »-)-*exp(-igf)dgi....( 27 r)-*exp(-ig*)d£„. 

From the form of this expression it follows that the ^’s are statisti- 
oally independent, and are normally distributed about zero with 
unit s.D. 

The transformation (3) corresponds to a rotation of rectangular 
axes in Euclidean space of n dimensions. In the choice of the 
coefficients there is a degree of arbitrariness. For 'Instance the 
coefficients for may be chosen first, subject only to the condition 

Sc!<= 1. 

i 

which ensures that the constants e-^ are the components of a unit 
vector. When has been determined ^2 be chosen in an infinity 
of ways, subject only to the conditions 

“ i» — 6» 

i i 

which express that are components of a unit vector, orthogonal 
to the vector whose components are c-u. At each step there is freedom 
of choice of an axis orthogonal to those already chosen; and only 
in the case of the nth axis is there no freedom of choice. 

76. Linear constraints. Degrees of freedom 
As above let the variates be normally distributed about zero 
as mean, with unit s.n. To begin with we assume that they are 
functionally independent; but presently we shall impose on them 
the linear restriction 


01 % + o,a:a + • • • + a»*n = 0, (8) 

an equation which may be divided throughout by the constant 
necessary to make S "I <= 1 : we assume that this has berat done. 

' i 

If the variates Xt are independent, we may obtain by an orthogonal 
linear transformation a set of n statistically independent variates 
ii, each of whidi is normally distributed about zero as mean, with 
unit 8.D.; and this may be done so that is identical with the first 
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member of (8). Now let the condition (8) be imposed on the x's. 
This is equivalent to patting » 0; and, since the variates ^g, 

are statistically independent of ix, their distributions are unaltered 
by the condition so imposed. Consequently 

i 1 a i-s 

where the are Gamma variates, each with parameter Thus 
is the sum of n — 1 independent y(|) variates, and is therefore itself 
a Gamma variate with parameter 4(n— 1). The distiibution of ^ 
is therefore obtained from (2) by putting n— 1 in place of n. The 
condition (8) has reduced the number of independent variates by 
one. The number of independent variates is usually called the 
number of degrees of freedom (d.f.), or briefly the number of freedoms. 
The term is borrowed from geometry and mechanics, where the 
position of a point or of a body is specified by a number of functionally 
independent variables called coordinates. Each independent co- 
ordinate corresponds to one degree of freedom of movement. Any 
constraint on the body reduces the number of degrees of freedom. 
For this reason a linear relation between the variates x^ is called a 
linear constraint. We shall assume that only linear constraints are 
involved. 

An appeal to geometry throws light on the above reasoning. If 
the variables are regarded as rectangular Cartesian coordinates 
of the current point P in Euclidean space of n diminsions, the 
square of the distance of P from the origin 0 is 

OP* = = ;i^», 

and, in terms of the alternative set if of coordinates relative to 
rectangular axes through the same origin, 

0P* = S^? = X** 

When the variables are connected by the equation (8), the point P 
is constrained to lie on a hj^rplane through the origin, determined 
by this equation. In terms of the fa the equation of the hyperplane 
is simply 


ii »> 0 , 
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and for a point P on this hyperplane we have 

1 a 

as above. Thus the linear constraint (8) restricts the freedom of 
movement of P to the hyperplane, in which there are only n— 1 
independent coordinates. is then said to correspond to n — 1 d.f. 

That each linear constraint reduces the number of freedoms by 
unity may be shown in a similar manner. Thus if there is a second 
linear constraint 

b[xi + b'^Xi+... + bnX^=‘0, (9) 

it is expressible in terms of the £’s in the form 
^a^a'l'^s&'i" ••• ~ 

n 

in which D&J = 1. We obtain x* ^^3 the sum of squares of n— 1 
a 

standard normal variates (i = 2 n), which are now connected 

by the linear constraint (10). Proceeding as before by an appropriate 
orthogonal transformation 

Vi 

in which is identical with the first member of (10), we may 
express as 

a:* “ S Vl 

and is thus a Gamma variate with parameter ^(n— 2), so that 
corresponds to n— 2 d.f. The argument holds for any number of 
linear constraints. Following Yule and Kendall* we shall denote 
the number of freedoms by i^. Then, if the n variates are subject to 
m linear constraints, 

v = n— TO. (11) 

In place of (2) we thus have for the distribution of corresponding 
to V D.F., 

• 1937, 1, p. 416. 


( 12 ) 
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Since is a 7 (|v) variate, the mean value of is i' and its modal 
value is I'— 2. The probability that the value of from a random 
sample will not exceed a fixed value is obtained by integrating 
(12) with respect to from 0 to Similarly, the probability that 
will exceed is the integral* of (12) with respect to from 
to 00 . The accompanying Table 3 gives the values of for values 
of V from 1 to 30, and for various fixed values of the probability P 
of exceeding In other words, P is the probability that in random 
sampling the value of shown in the body of the table will be 
exceeded. 

Example. Show that, for 2 d.f., the probability P of a value of greater 
than Xo is exp ( — and hence that ^ = 2 log, 1/P. 

77. Distribution of the ‘sum of squares’ for a random sample 
from a normal population 

Consider a random sample of n independent values x^ from a 
normal population of variance &^. K as usual x denotes the mean 
of the sample, the sum of squares of the deviations from the mean is 

n/8* = S(a:,-®)a. (13) 

i 

This is the ‘sum of squares’ to be considered; and we shall prove 
that 7i/5*/<r* is distributed like forn — 1 d.f. We know that 

2*1 = S(®<— (14) 

i i 

If then we introduce an orthogonal linear transformation (3) of the 
variables x^ suchf that = x-^n, we have by (14) 

i i-1 i-2 

and therefore nS^/cr^ = 2 (IS) 

<-2 

Now, with origin at the mean of the population, the x*b are normally 
distributed about zero as mean with s.d. or, and so also are the ^’s. 

* For a method evaluating this integral see Fisher, 1935, 1, pp. 355-7. 
See also Ex. nc, 9. 
t Cf. Sawkins, 1940, 3, p. 225. 
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CSonsequently ^S*l<r* is a Oamma variate with parametOT — 1), 
and its distribution is 


dP> 


1 /»ig*\ 


Kn-8) 


exp 


l-nS^.(nS<\ 


In other words n8*l<r* is distributed like with « — 1 dj. 

It is convenient to express the result in terms of the statistic a* 
. of § 64, which is an unbiased estimate of the variance of the popula- 
tion. Thus 09 / i\ ■ 1 

no* = (n— 1)«* = var, 

and the distribution of s* is therefore 


the estimate a* being based on v d.f. 

The coefficient of ds* in (16) is the probability density in the 
distribution of a*. Suppose that the population variance is unknown, 
and that we enquire what value of it would make the probability 
density of «* a maximum for the given value of «*. This is obtained 
by equating to zero its derivative with respect to <r*. The procedure 
leads to O'* => s*. For this reason s* is called the optimum value of 
the population variance corresponding to the given sample; and the 
method of obtaining it is called the method of maximum Ukdihood. 
It is due to R. A. Fisher. 

In the above argument is independent of and there- 

fore, in virtue of (16), x is independent of 8K Thus the sampling 
distributions of the mean and the variance are independent. And, 
sinoe ijo' is a standard normal variate, so is x^/tr. It follows that 
S is. normally distributed with the same mean as the population, 
and with variance <r*/n. 


78. Nature of the chi-square test. An illustration 
The test is a means of judging the credibility of an hypothesis 
concerning the population (or populations) from which the values 
of the sample (or samples) are drawn. The hypothesis to be tested 
must be of such a nature that we can determine, from the sample 
values of the variate, the corresponding value a certain statistic, 
uriiich is distributed like for a known number of freedoms. The 
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Table 3 . Valuea of ^ wiih probability P of 
being exceeded in random sampling 

V ss number of degrees of freedom 



0*99 

0-96 

0-60 

0*30 

0*20 

0*10 

■ 

0*01 


0-0002 

0-004 

0-46 

1*07 

1*64 

2*71 

3-84 

6*64 


0020 

0*103 

1-39 

2*41 

3-22 

4-60 

6-99 

9*21 


01 16 

0*36 

2*37 

3*66 

4-64 

6-25 

7-82 

11*34 


0-30 

0*71 

3-36 

4*88 

6-99 

7-78 

9*49 

13*28 

6 

0*66 

1*14 

4-36 

6-06 

7*29 

9*24 

11*07 

16*09 

6 

0-87 

1*64 

6-36 

7*23 

8-66 

10*64 

12*69 

16-81 

7 

1*24 

2*17 

6*35 

8*38 

9*80 

12*02 

14-07 

18*48 

8 

1*66 

2-73 

7*34 

9-62 

11-03 

13-36 

16-61 

20-09 

9 

2-09 

3-32 

8-34 

10-66 

12*24 

14-68 

16-92 

21-67 

10 

2*66 

3*94 

9*34 

11*78 


15-99 

18-31 

23*21 

11 

306 

4*68 

10-34 

12-90 


17-28 

19*68 

24*72 

12 

3*67 

6*23 

11-34 

14-01 


18-66 

21-03 

26*22 

13 

411 

6-89 

12-34 

16-12 

16-08 

19-81 

22-36 

27-69 

14 

4-66 

6-67 

13-34 

16-22 

18*16 

21-06 

23*68 

29*14 

16 

6*23 

7*26 

14-34 

17-32 

19-31 

22*31 

26*00 

30*68 

16 

6*81 

7*96 

15-34 

18*42 

20*46 

23*64 

26*30 

32*00 

17 

6-41 

8*67 

16-34 

19*61 

21-62 

24*77 

27-69 

33*41 

18 

7*02 

9*39 

17-34 

20-60 

22-76 

25-99 

28*87 

34*80 

19 

7-63 

10-12 

18-34 

21*69 

23-90 

27-20 

30*14 

36*19 

20 

8*26 

10-86 

19-34 

22-78 

26-04 

28*41 

81*41 

37*67 

21 

8-90 

11-69 

20-34 

23-86 

26-17 

29-62 

82*67 

38*93 

22 

9-64 

12-34 

21-34 

24-94 

27-30 

30-81 

83-92 

40-29 

23 

10-20 

13-09 

22*34 

2602 

28-43 

32-01 

36-17 

41-64 

24 

10-86 

13-86 

23-34 

27-10 

29-55 

33-20 

36*42 

42*98 

26 

11*62 

14-01 

24-34 

28-17 

30-68 

34-38 

37-66 

44-31 

26 

12-20 

16-38 

25-34 

29-25 

31-80 

35*66 

38-88 

my\Wm 

27 

12-88 

16-16 

26-34 

30-32 

32-91 

36-74 

40*11 


28 

13*66 

16-93 

27-34 

31-39 

34-03 

37*92 

41*34 


29 

14-26 

17-71 

28-34 

32-46 

36-14 

39*09 

42*66 

49-69 

30 

14-96 

18*49 

29-34 

33-63 

36*25 

40*26 

43*77 

60*89 


Reproduced by permission of the author* Professor R. A. Fisher* from his book 
on Statistical Methods for Research Workers. 


table then tells us the probability P that, in random sampling, a 
value of this statistic will occur greater than the value actually 
obtained. If this probability is very small, we regard the value 
obtained as significantly large, and conclude that the hypothesis is 
probably inooireot. 

Two conventional values of P are employed in deciding signi- 
ficance, viz. 0*06 and 0*01. These determine the 6 % and the 1 % 
levels of sognifioanoe respectively. If the value of P obtained is 
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greater than 0-05 we infer that, in more than 5 % of random samples 
from the population in question, the value of obtained would be 
greater than that actually found, which is therefore regarded as not 
significantly large. If, however, P is less than 0-05 the value found 
is regarded as significant at that level. Similar remarks apply to the 
1 % level of significance. Values which are significant at the 1 % 
level of probability are said to be highly significant, and are some- 
times distinguished by a double asterisk. Significance at the 6 % 
level is then denoted by a single asterisk. 

The value of obtained from the sample may also be signi- 
ficantly small. A glance at the probability curve of will help to 
make this point clear. When the number v of degrees of freedom is 



greater than 2, this curve has the form indicated in the diagram, 
except that when v = 3 it touches the y-axis at the origin, and when 
v=‘ 4 it touches neither axis at that point. The ordinate for any 
value of X* is the probability density for that value; and this is small 
for small values of When the value of P found from the table is 
greater than 0*96, the probability of obtaining a smaller value of x* 
is less than 6 %, and the sample value must be regarded as ngni- 
ficantly small. Similarly, when P is greater than 0*99 the probability 
of a smaller value of x* is Isas than 1 %, and the smallness of the 
sample value is highly significant. 

As an illustration of the test we may consider the following. 
Let Xf be a random sample of n values from a normal population 
of variance o*’, x the sample mean and 
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Then we know that nS^jcr^ is distributed like ^ with n— 1 d.f. 
If then we are given the sample values but have no certain know- 
ledge of the population from which they were drawn, we may test 
the hypothesis that they came from a normal population of s.D. a. 
For, when has been found from the sample, our hypothesis gives 
n8^l<r^ as the sample value of ^nd the table then enables us to 
decide whether this value is significant or not, that is to say whether 
our hypothesis is improbable or not. 

Example, A random sample of 12 values gave an unbiased estimate s* of 
the population variance equal to 10-62 mm.* May the sample be reasonably 
regarded as from a normal population with variance 7 mm.* ? 

Here nS^ = 11 x 10-62 s 116-82, and, according to our hypothesis con- 
cerning the population, 

X* = 116-82/7 = 16-7. 

The number of d.f. is 1 1. From the table we see that the probability, P, that 
the value of in such samples will exceed 16-7 is greater than 0*10. The 
value found is therefore not signihcant, and the test provides no evidence 
against the hypothesis of a normal population with variance 7. 

79. Test of goodness of fit 

We shall now consider the test of goodness of fit devised by 
Karl Pearson. As in Chapter vn let the population be one whose 
members may be separated into a number k of classes, and let 
be the relative frequency of the tth class. In the choice of a simple 
sample of n members from this population, the probability at any 
drawing that the individual selected will belong to the ith class is p^. 
The frequency of this class in sampling has a mean value 
given by 

m, = npi, (17) 

and the distribution of/^ is the binomial. For large values of n the 
binomial distribution approximates to normal; and the sampling 
distribution of any class frequency is therefore approximately 
normal for large samples. 

Next suppose that we are given a set of class frequencies /{, 
whose sum is n, without any information about the source of the 
values. We may wish to enquire whether they may reasonably be 
regarded as those of a simple sample from a certain hypothetical 
population. The population is hypothetical in the sense that it is 
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determined by an hypothesis, which must be such as to enable us to 
calculate the expected values of the class frequencies in random 
sampling firom it. We shall prove that, for large samples, the variate 
2 (/i— conforms approximately to the dktribution, and 

shall show how the number of degrees of freedom is determined. 
The value of x* obtained frrom the sample enables us to test the 
credibility of our hypothesis. 

Since, for large samples, the class frequency is distributed 
approximately normally about as mean, the variate 

( 18 ) 

is distributed approximately normally about zero as mean. We 
require the variance erf of when the class fr'equencies are entirely 
independent', and, to obtain this, we must take account* of the 
variation in their sum n. Thru 


n = S/i. (19) 

and the size n of the sample varies about a mean n with s.d. (r„. 
Since the frequencies /« are supposed independent, the variance of 


n is given by 


i 


( 20 ) 


Now, in virtue of §47 (8), the variance a\ for samples of varying 


<r| * 


( 21 ) 


Summing for all the classes we have, by (20), 

o-» = 

i i 

or, since = 1» . 

Hence o^ = n,BO that (21) is equivalent to 


O’? = (1><?«+J>?)» (22) 

The quantity defined by 

X* “ - S(/<-»»i)*/»»< (28) 

is a sum of squares of standard approximately normal deviates, 
and is therefore distributed approximately like ;i^. 

« Cr.BWMr,1982,l.p.88. 
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For Bamplee of fixed size, the number of &.F. is oleady leas than 
the number k of classes. For the sum of the class frequencies is 
constant, and this corresponds to a linear constraint on the variates 
Xf. Further, to determine the theoretical class frequencies m^, it 
is sometimes necessary to estimate parameters of the population 
from the data of the sample. For instance, in testing the hypothesis 
of a normal poptilation, it may be necessary to estimate the mean 
and the variance of the population frrom the sample values. Each 
estimate of a parameter obtained in this manner corresponds to the 
introduction of a linear constraint. For the moments about the 
mean are linear, or approximately linear, functions of the class 
frequencies; and, in equating such a function to a parameter, we 
introduce an approximately linear constraint on the variates x^. 
In calculating the number of freedoms of each constraint intro- 
duced in this manner must be recognized. 

The approximation of the binomial distribution to normal, when 
n is large, does not hold for very small values of p or q. Hence the 
above argument is not valid if one of the class frequencies is small; 
for that woxild make the corresponding relative frequency //n very 
small, and therefore the probability p for that class in sampling 
from the population also very small. Classes of small frequency 
may be treated by combining two or more of them to form a class 
sufELdently large. 


80. Numerical examples 


Example 1. Can the wages of 1,000 employees, given in £z. 1, 3 (p. 17), be 
regarded as a random sample from a normal population! 

Our hypothesis is that the population is normal. Since the sample is large 
its mean and its variance are taken as estimates of those of the population. 
In Ex. Ill, 6 (p. 80 ) the class frequencies per thousand of the normal population 
are given. On account of the smallness of the extreme frequencies we combine 
the first two classes, and also the last two, leaving 13 classes. Since the sum 
of the class frequencies is constant, and two of the parameters were estimated 
fh>m the sample, the number of degrees of freedom is 10. The value of x* 
firom the sample is 


6* 10* 14* (3'2)* 

Y»a! — + — + 0 + — +...+i-^e 19-84. 
* 18 28^ 79 18 


From the table we see that, for 10 d.f., this value of is significant at the 
6% level. We conclude that the assumption of a normal population is 
probably incorrect. 
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BxompU 2. From the adult male populations of seven large cities, random 
samples of the sizes indicated below were taken, and the numbers of married 
and single men recorded. Do the data indicate any signihcant variation 
among the cities in the tendency of men to many? 

City... A B G D E F Q Total 

Married 133 164 165 106 163 123 146 080 

Single 36 67 40 37 66 33 36 294 

Total 169 221 195 143 208 166 182 1274 

We test the hypothesis that there is no significant variation in the ten- 
dency mentioned. Then the men from each city may be regarded as a simple 
sample from a population in which the ratio of married men to single is 
approximately the same as in the coluimi of totals. This ratio is 10 : 3. The 
theoretical frequencies for any city are then obtained by dividing the total 
for that city into two parts in this ratio. These frequencies are: 


City... 

A 

B 

c 

D 

E F 

G 

Total 

Married 

130 

170 

160 

110 

160 120 

140 

980 

Single 

39 

61 

46 

33 

48 36 

42 

294 

Total 

169 

221 

196 

143 

208 166 

182 

1274 

From these figures we have 








3* 

130 

62 

170 

6» 

160'^' 

6» 

.. + - = 6.34. 



To find the number of freedoms we observe that the sum of the frequencies 
of married and single men from any city is constant, being equal to the size 
of the sample from that city. This reduces the number of independent fre- 
quencies to 7. And further, a parameter of the population was estimated from 
the sample, namely, the ratio of the numbers of married and single men. 
Consequently I' = 6. For this number offreedoms the probability of obtaining 
a larger value of than 6*34 is about 0*50. The value is therefore not signi- 
ficant, and the test furnishes no evidence against the hypothesis. 


Example 3. May the data in Ex. Ill, 7 (p. 61) be regarded as those of a 
random sample from a Poissonian distrihntion ? 

The mean, m = 1*2, was estimated from the sample, and from it the 
theoretical frequencies per thousand of the Poissonian distribution were 
calculated in the example referred to. To apply the test we combine the 
last three classes. Then 




(3^» W (4*4)« 

301-2 361*4 7-6 



Here the number of classes is 6; but the total frequency is constant, and m 
was estimated from the sample, so that v as 4. With 4 d.f. the probability 
that ^ will exceed 3*6 is nearly 0-60. The value is therefore not at all signi- 
ficant, and the assumption of a Poissonian population is not discredited. 
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81. Additive property of dll-square 

Theorem I. If the independent variates x and y conform to the ^ 
distrihviion^ voiih and d.7. respectively^ then x-{^y is distributed 
like ydth d.p. 

For \x and Jy are independent Gamma variates with parameters 
and ip^ respectively. Therefore, by §68, Theorem II, J(aJH-y) is 
a Gamma variate with parameter i(i^i + »' 2 ). Consequently x+y 
conforms to the x^ distribution with p^ + p^ d.p. 

Similarly, corresponding to § 68, Theorem III, we may state 

Theorem II. If the sum of two independent, positive variates is 
^distributed like x^ with + 1^2 d .f., and one of them is distributed like 
X^ with p^ D.F., then the other is distribvled like x^ with p^ d.f. 

In the same way Theorem V of §69 and Theorem VIII of §73 
may be expressed as 

Theorem III. If the independent variates x and y are distribvled 
like x^ with p^ and Vg d.f. respectively^ then xl{x + y) is a Bela variale 
of the first kind with parameters \p^ and ^p^^ while xjy is a Beta variate 
of the second kind with the same parameters, 

82. Samples from an uncorrelated bivariate normal population. 
Distribution of the correlation coefficient 

The distribution of the coefficient of correlation, r, in samples 
from a normally correlated bivariate population was given by 
Fisher* in 1915. In the particular case of an uncorrelated popula- 
tion {p = 0) the distribution of r is very simple. Let a*! and (Tj be 
the s.D.’s of a; and y in the population, and let the variates be mea- 
sured from their means. Consider a random sample of n pairs of 
values Xiy y^, from such a population. The variances 5J, of a?, y 
in the sample are given by 

ns\ = s nsi = s 

t t 

and the correlation, r, in the sample is 

“ nS^S, 

* Fisher, 1915, 2. A simple and excellent proof of a different oliaracter 
has recently been published by Sawkins (1944, 1). The proof for/) ss 0 given 
in this section is bcused on that of Sawkins. 


13 
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Chfing to the ziature of the sampliog the n values cue i&depm- 
dent. these be subjected to an orthogonal transformation 
yielding a variates the first of which may be taken as 

since the sum of squares of the coefficients of the is unity. Then 

Sy? = S»f = S (y<-y)*+»y* “ »^+vi» 
111 


so that AiS^ = ^Tif, (i) 

s 

and this sum. divided by <r|, is distributed like ^ with a — 1 i>.f. 
Further, from the above definition of r, 

- 'H!Dt-^)iyt-y)Nn8i = 2:(»i-«)y</>-8i- 
We may take this sum for the second variate, for the sum of 
the squares of the coefficients of the y^ is unity, the orthogonal 
condition is satisfied by the coefficients of ih and 7,, and the 
variables x and y are independent. Since 7I » Ar‘jS| it follows from 

n 

(i) that £7* » a(1— Further, the are independent 
s 

standard normal variates. Thus nr‘<S|/(r§ and a(1— r*)S|/<r| are 
distributed independently like with 1 and a —2 d.f. re- 
spectively; or we may express it by saying that the former is a 
and the latter a It follows that 

j ni*8l nr»/S|/o-j Xi 

“ Afi!| “nr*/5^(r|-f-A(l-r*)iS|/(rj ;t!+;^n-8’ 


and therefore, by § 81 , Theorem III, r* is a §(a— 2}) variate, 
witkdiBtributioa ^ (r.)l-.(i_r.)K,-oj„.) 

a(l.i(»-2)) • 


We may thus state the important 


Thbobbm IY. For random aarAples of n paira of valuea from on 
imeorrelated bivariate normal population, r* is a Beta variate of the 
firat hind with parametere | and ^(a— 2). 

The expected value of r* for such samples is therefore 1/(a- 1), 
and tile s.s. of r is 1 /V{u— 1 ). And sinoe r ranges firom — 1 to 1 , ^e 
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opposite values ± r giving tlie same value of r*, the distcibution 
of r is 


dp 


( 1 — 


(25) 


In considering the linear regression of ^ on a; vre proved the 
relation (of. § 27 (31)) 

S(yi-y)* = I:(y,-3^)*+S(3^-y)^ (26) 

where Y is the estimate of y from the regression equation. In the 
present notation this corresponds to the identity 

niS| = »(1 - r*) SI + nr^Sl. (27) 

We have just seen that, for samples from the above population, 
these three sums are independent. Thus the sum of squares of 
deviations from the line of regression is distributed independently 
of the sum of squares of deviations due to regression in the sample. 
When divided by (r| the three sums in (27) are distributed like ^ 
with n— 1, n— 2 and 1 n.?. respectively, illustrating the additive 
property of stated in § 81. 


83. Dietribution of regression coefficients and correlation ratios 

The distribution of the linear regression coefficient, b, of y on a; 
follows from the above results. For 


so Hiat 


b = rSJSi 

6V| nr*Sl/(rl 
<r| nSytrl" 


As we have just seen, the numerator and denominator of the second 
member are distributed independently like ^ with 1 and n — 1 n.!*. 
respectively. Consequently, by Theorem UI, the quotient is a 
fit( ~ 1 )) variate, and therefore b*oryc\ is a variate of the same 

type. Hence we may state 

Thkobkm V. In random samples of n pairs of values from an 
vneorrekUed bivariaie normal population, in which ths standard 
dsviatioTisqfihevarialiksareaianda’fftheUnearregtsssioneoefficient 


ia*s 
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b ofy onxia stick that is a Beta variate of the second hind u>ith 

parameters ^ and \(n— 1). 


Since the mean value of a m) variate islKm— l),the expected 

value of 6* in sampling is (r|/(n— 3)(rf. And, since the range of 6 is 
from —00 to + 00 , the distribution of 6 is 


, <ri(T^~^db 

i(» - 1)) (<r| + 6Vf)»»** 


(28) 


The distribution of the correlation ratio, rj, of on a; may be 
determined from the corresponding resolution of the sum of squares. 
With the notation of § 34 let the subscript i distinguish the particular 
array of ^’s, while j indicates position in that array, yi denoting the 
mean of the y’s in the ith vertical array, and n^ the frequency in 
that array. Then, in virtue of § 34, we have the resolution 


(29) 

i i i 

the various sums being equal to the corresponding terms of the 
identity 

‘ nSf =« n(l — tj^) iS| + nipSl. (30) 

Now the first member of (29), divided by <rl, is distributed like 

with n — 1 1>.F. Further, since 21 denotes summation over the values 

i 

of any particular array, 'S!,{yij—yi)^/oi is distributed like x* 

— 1 D.V. Summing for all the h arrays we see that the first sum in 
the second member of (29), divided by ol, is a with S (% — 1) or 

i 

n—h D.r. And, because the means of the arrays are distributed 
independently of their variances, it follows from §81, Theorem II, 
that the last sum in (29), divided by <r|, is a ^ ~ I We 

may therefore write 

nV*8l nii*8\la\ 

’ “ n5S nv»-8S/<r|+n(l-9*)8i/<r|-;^_i+;e„_»» 


* See also Ex. 7 at the end of this chapter. 
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in which the subscript indicates the number of n.r. for the X** 
Thus, in virtue of §81, Theorem lU, ly* is a ^)» i (»“*)) 

variate, and we may state 

Thbobbm VI. For random samples of n pairs of mlues from an 
uncorrdated bivariate normal population, the square of the correlation 
ratio of y on X is a Beta variate of the first kind with parameters 
1) and §(»— A), where his the number of arrays of y's. 

The distribution of ij* is thus 


£(J(A-l),i(»-A)) ‘ 


and its mean value,* by § 69 (23), is 


i-l 




(31) 

(32) 
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EXAMPLES IX 

1. Apply the x* test to show that the deaths of centenarians 
recorded in the example of § 18 may reasonably be regarded as a 
random sample from a Poissonian population. 

2. A certain hypothesis was tested by two similar experiments, 
which gave = 14*7 for v = 9 and = 14'9 for v » 1 1 , Show that 
the two experiments combined give less reason for confidence in tire 
hypothesis thaw either experiment alone. 

* Cf. Fisher, 1922, 2, pp. 004-6. 
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3. Bromby mathematioal induction that the eom of squaree 
of n independent standard normal variates conforms to the x* 
durtaibutdon (2) of § 74. (See Fisher, 1935, 1, pp. 353-4.) 

4. Geometrical proof of the distributim. The following geo- 
metrical proof, that the sum of the squares of n independent stan- 
dard normal variates Zt has a distribution ^given by (2) of §74, is 
due to Fisher (1935, 1, p. 354). As in §74 the probability that the 
values of the variates will fall simultaneously in the respective 
intervals dxt is 

dp = (27r)-l» exp ( - i;\:*) dx^dx ^ . . . dx^ (i) 

where = 

i 

Let us regard the Xf as coordinates of the current point P in Euclidean 
q>aoe of n dimensions. Then dp is the probability that P will fall 
in the element of volume dxidx^...dx^. The coefficient of this 
element of volume in (i) is therefore the probability density for this 
space, and it is proportional to exp ( — §;^). Now we can express, in 
terms of x ai^d dx, an element of volume in which the value of x 
may be regarded as constant. For, since is the square of the 
distance of P from the origin, x is constant over the surface of a 
hypersphere with radius x s<nd centre at the origin. The volume 
enclosed by this hypersphere is proportional to and the element 
of volume between this and the adjacent hypersphere of radius 
X+^X^ proportional to d(X“)> that is to Using the above 

value of the probability density, we see that the probability that 
P will fall in the region bounded by the two hyperspheres is pro- 
portional to x^^exp ( — ^;^) dx', and this is the probability that the 
v^ue of X the random sample will fall in the intervfd dx. The 
above probability is clearly proportional to 

( - iA:*) 

and, since must he between 0 and -i-oo, the constant factor is 
lir{^)t so riiat the integral of the probabihty throughout this 
range may be unity. Since Ues in the interval dj^ when x lisa in 
rixe interval dx, it follows that the distribution of x* ia given by 
§74(2). 
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For still another proof of the ^ distribution see Plummer, 1940, 
1, p. 246. 

5. Homogenuity cf several estimates of the population variance.. 
Suppose that k independent samples have furnished estimates s{ of 

the population variance, based on (» = 1 k), d.f. Are these 

estimates such that the samples may be regarded as drawn from 
the same population? In other words, are the estimates homo- 
ffeneotisl 

On the hypothesis that the samples are from the same population, 
an unbiased estimate s‘ of the variance of the population is 

= (v-Si’i), 

>' i i 

since the expected value of this quantity is the variance of the 
population, in virtue of §54(8). Bartlett* has shown that the 
statistic 

i(»»loge«*-2*’<log,«?), 

o-*+3(5bj(srrj)- 

is distributed approximately as with k— 1 d,p. The value of x* 
calculated from the data will tell whether the hypothesis of homo- 
geneity is reasonable. 

6. Show that the estimates 3*8, 4*4, 8*1, 6*1, and 9*4 of the 
population variance, based on 6, 8, 6, 7 and 4 d.f. respectively, may 
be regarded as homogeneous according to the test of Ex. 6 (1; » 5, 
X» » 1*46). 

7. That the sum of §83 is distributed like x‘ 

with A — 1 D.9. may also be shown as follows. Since p is the weighted 
mean of the pf we have 

= Snt(pi-p)‘+n(p-/t')K (ii) 

Now is distributed normally about zero as mean with 

variance a^n^. Consequently n^iS^—|l')*|a\ is a with 1 d.f.; and, 

* Proo. Roy. Soe. A, vol. 160, 1037, pp. 268-82. See also Neymaa and 
Pearson, 1931, 3. 
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on summing for aQ the arrays, we see that the first member of (ii) 
divided by <r| is a with h D.r. Similarly, is distributed 
normally about zero as mean with variance o|/n; and the last term 
in (ii), divided by a\, is therefore a with 1 d.f. Theorem n of § 81 
then shows that the first sum on the right of (ii), divided by a\, is 
a with A - 1 D.F., as stated. 

8. Taking tire correlation ratio as poative, deduce &om § 83 (31) 
that the mean value of the correlation ratio of y on x, for samples 
&om an uncorrelated bivariate normal population, is 

9. Integrating by parts show that, if r is an even positive integer, 


and, by continuing the process, show that the value of the integral is 





21 ^ 3 !^ 



Hence show that the probability, in random sampling, that the 
value of with v (even) d.f. will exceed is obtained firom this 
expression by putting fi =» and r = ^(v- 2). (Cf. Fisher, 1936, 
1, pp. 366-7.) 

10. The variables z, y are normally correlated with coefficient p. 
Show that u and v, defined by 


M = z/ffi + y/ffg, V » z/o-j - y/o-j, 

are independent normal variates with variances 2(1 +p) and 2(1 -p) 
respectively; and that, if iJ is the correlation between u and v in 
random samples of n pairs from the bivariate population, is a 
Jtt- 1) variate. 



CHAPTER X 


FURTHER TESTS OF SIGNIFICANCE. 
SMALL SAMPLES 


84. Small samples 

The use of the standard errors of statistics in the tests of Chapters 
VI and vn depends upon the. fact that, in the case of large samples, 
the sampling distributions of many statistics are approximately 
normal, or at any rate unimodal with the property that a value of 
the statistic, deviating from its mean by more than two or three 
times its s.s., is very unlikely. For small samples, however, the 
distributions of statistics are often far from normal,. Moreover, 
the estimate of a parameter of the population made from a small 
sample is not at all reliable. For these reasons the use of standard 
errors in connection with such samples is very limited. The chief 
concern of the theory of small samples is with the disttibutions of 
various statistics, and the applications of tests of significance based 
upon these distributions. In each application we test a hypothesis 
concerning the source of the sample. This hypothesis may be, for 
instance, that a certain parameter of the population has a specified 
value, or that two given samples were drawn from the same popula- 
tion. In each instance the test employed enables us to form a con- 
clusion based on considerations of probability. The nature of the 
tests is illustrated by the x* of §78, which is applicable to 
samples of any size. But, as already indicated, the test of goodness 
of fit considered in § 79 can be applied only to large samples. 

We remind the reader of two important results obtained earlier. 
First, in simple sampling from a population with mean /t and 
variance c*, the distribution of the mean z of the sample of n 
members has n for its mean and cr^/n for its variance (see §60). 
This result holds whether the sample is large or small, and whether 
the population is normal or not. In the case of a normal population 
the distribution of S is also normal (see §§ 22 or 82). Secondly, the 
statistic s* defined by n 

(n-l)«*= 1:(!C<-2)» 


( 1 ) 
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is aa unbiased estimate of the population varianoe a*, in the sense 
that 9B o* (cf. § 64 (8)). This estimate of <r* is said to be based 

on »— 1 0.7., sinoe the Tariates x^-~x are not independent, being 
connected by the linear relation » 0. The number of 

independent variates is thus only n— 1; and this agrees mth the 
result of § 77, that (n— distributed like j^vdthn—1 d.f. 

The follo^nng discussion applies to aampUa of any size. It is 
assumed, however, that the populations from which the samples 
are drawn are normal. The results obtained are therefore strictly 
true only jn this case. But they are approximately true, and may be 
usefully applied, in most cases in which the departure of the popula- 
tion from normality is not very marked. 


‘Student’s’ Disteibution 

85. The statistic t and its distribution 
We have seen that, in simple sampling from a normal population 
of mean ft and variance <r*, the deviation x—/i is a normal variate, 
with mean zero and s.d. cr /^. The quotient of x — ft by is 
Iberefore distributed normally with unit s.d. If, however, in place 
of the constant <r we use the variable estimate s obtained from the 
sample, we have the statistic 

( = 2 =^. ( 2 ) 

which is not normally distributed. The distribution of t was first 
found by W. 8. Gosset, who wrote under the 7u>m de plume of 
‘Student’. It follows very simply from the theorems of §§67, 73 
and 77. For, from its definition, 

^{nS»/v) ’ 

where wS* « 2 (»< - 2)* and v is the number of d.e., n - 1, on which 
the estimate s* is based. Hence 

<* !(»-/*)*.«- ((r*/n) 

V “ i»5»/cr« * 
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But, in virtuei of §67 (Theorem I) and §77, the numerator in the 
second member is a Gamma variate with parameter §, and the 
denominator a Gamma variate with parameter and these are 
dislaibuted independently of each other (cf. §82 or Ez. Vin, 11). 
Consequently fi/p is a variate; and its distribution is 

obtained by substituting fi/p for x in §72(27), with | and 
m Thus the probability that, in random sampling, the value 


of t* will fall in the interval dfi is 


dP^ 


(Z*)-*d(Z») 

V«'..B(ii>')(l+W<'+«- 


(3) 


By integrating with respect to from a fixed value Zg to infinity,* 
we obtain the probability that the value of t* will exceed Zg. The 
accompanying table contains extracts from a more complete one 
given by Fisher. In the body of the table are given the values of Zg 
corresponding to certain fixed values of p and P. Thus, for a specified 
number p of d.f., the value of P at the top of the colunm is the 
probability that, in random sampling, the numerical value of Z in 
the body of the table will be exceeded. 

The range of Z* is from 0 to oo, and its distribution is given by (3). 
The statistic Z, however, ranges from — oo to + oo, and its distribution 
is therefore 

“ >.‘:^i,ii/)(l+Z«/»')«'+»’ 


the factor 2 disappearing, since the integral of dP over the whole 
range of variation of Z must be unity. The distribution (4) is spoken 
of as the Z distribution corresponding to p i>.f. And from the above 
argument it is clear that any statistic Z, whose range is from — oo to 
+ 00 , and which is such that t^[p is a y?g(|, |v) variate, conforms to 
the Z distribution for p d.f. This important result may also be 
expressed in the form of 

Thbokem I. A etaiisiic Z conforms to the t dtsZri6«Zton/or vd.f. if its 
range is from — oo Zo +oo, and t*/p is expressible as the qv/Aienit of two 
independent variates, which are distributed Uke ^ with, 1 and v D.F. 
respectivdj/, 

* For a method of evaluating this integral see Fisher, 1836 , 1 , pp. 868 - 60 . 
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Table of t 


[X 


Table 4. Values of mod, t with a prohahility P 
of being exceeded in random sampling 

V aa number of degrees of freedom 



0*50 

0-10 

0*05 

0-02 

0*01 

1 

1-000 

6-34 

12-71 

31-82 

63-66 

2 

0-816 

2-92 

4-30 

6-96 

9-92 

3 

0-765 

2-35 

3-18 

4-54 

5-84 

4 

0'741 

2-13 

2-78 

3-75 

4-60 

5 

0-727 

2*02 

2-57 

3-36 

4-03 

6 

0-718 

1-94 

2-45 

3-14 

3-71 

7 

0 711 

1-90 

2-36 

3-00 

3-50 

8 

0-706 

1-86 

2-31 

2*90 

3-36 

9 

0-703 

1-83 

2-26 

2-82 

3-25 

10 

0*700 

1*81 

2-23 

2-76 

3-17 

11 

0-697 

1-80 

2-20 

2-72 

3-11 

12 

0-695 

1-78 

2-18 

2-68 

3-06 

13 

0-694 

1-77 

2-16 

2-65 

3-01 

14 

0*692 

1-76 

2-14 

2*62 

2-98 

16 

0-691 

1-75 

2-13 

2-60 

2-95 

16 

0-600 

1-75 

2-12 

2-68 

2-92 

17 

0*689 

1-74 

2-11 

2*57 

2-90 

18 

0-688 

1-73 

2-10 

2-65 

2-88 

19 

0-688 

1-73 

2*09 

2-64 

2*86 

20 

0-687 

1-72 

2-09 

2-53 

2-84 

21 

0-686 

1-72 

2-08 

2-52 

2-83 

22 " 

0-686 

1-72 

2-07 

2-51 

2-82 

23 

0-685 

1-71 

2-07 

2-50 

2-81 

24 

0-685 

1-71 

2-06 

2-49 

2-80 

25 

0-684 

1-71 

2-06 

2-48 

2-79 

26 

0-684 

1-71 

2-06 

2-48 

2-78 

27 

0-684 

1-70 

2-05 

2-47 

2-77 

28 

0-683 

1-70 

2-05 

2-47 

2-76 

29 

0-683 

1-70 

2-04 

2-46 

2-76 

30 

0-683 

1-70 

2-04 

2-46 

2-75 

35 

0-682 

1-69 

2-03 

2-44 

2-72 

40 

0-681 

1-68 

2-02 

2-42 

2-71 

45 

0-680 

1-68 

2-02 

2-41 

2-69 

50 

0-679 

1-68 

2-01 

2*40 

2-68 

60 

0-678 

167 

1 2-00 

2-39 

2-66 

00 

0-674 

1-64 

1-96 

2-33 

2-58 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers. 

The reader should sketch the probability curves for and t. 
The former is asymptotic to both axes, with ordinate which decreases 
continuously as increases. The probability curve of < is symmetrical 
about the line f <= 0. It is asymptotic to the f*axis at each end, and 
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the maximum ordinate is that for ^ 0. Thus small values of fi are 

more likely than larger values. In this respect differs from 
when the latter has more than 2 d.v. 

86. Test for an assumed population mean 

The statistic t and the above table provide a means of testing an 
assumed value /i for the mean of the normal population from which 
the random sample was drawn. On the hypothesis that /i is the 
true value, the equation (2) and the values of x and s derived from 
the sample enable us to calculate t. The table then gives the prob- 
ability P that this value will be exceeded numerically in random 
sampling from a normal population with mean /i. If P is less than 
0*05 we regard our value of ^ as significant. If P is less than 0-01 
we regard it as highly significant. A significant value of t throws 
doubt on the truth of the hypothesis that pb is the mean of the 
population. 

Example. A random sample of nine from the men of a large city gave a 
mean height of 68 in. ; and the unbiased estimate of the population variance 
found from the sample was 4*5 in.* Are these data consistent with the 
assumption of a mean height of 68*5 in. for the men of the city? 

Large populations of heights of men known to be approximately 
normal. For the given sample 

X = 68*0, p = 9- 1 = 8, « = = 2*12, 

and therefore, on the h 3 rpothesis that p = 68*5, we have 

I < I = I I = (0*6) X 3/2*12 = 0*707. 

From the table we find that, for v = 8, the probability that this value of t 
will be exceeded numerically in random sampling is about 0*50. The value 
is therefore not at all significant, and the test provides no evidence against 
the assumption of a population mean of 68*5 in. 

If we make the assumption that p = 69*5 or 66*5 in. we obtain a value of 
1 1 1 three times as great as before, viz. 2*12. The probability that, for v = B, 
this value will be exceeded in random sampling is greater than 0*05, and the 
new value of t is still not significant. There are thus fairly wide limits for the 
assumed population mean which, with the data of the sample, will provide 
a value of t which is not significant. These limits we proceed to consider. 

87. Fiducial limits* for the population mean 

Suppose that a certain sample from a normal population has a 
mean S, and provides an unbiased estimate s* of the population 


* Cf. Fisher, 1930, 8 and 1933, 1. 
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varianoe based on v J>.v. We wish to find limits to the assumed 
pofmlafiiini mean fi so that, with the data of the sample, it will lead 
to a value of t that is not significant. If our choice is the 5 % levdi 
of significance, we define the 96 % confidence range for /i as that 
range of values of fi which, with the data of the sample, will furnish 
a value of | f | less than the value which corresponds to P => 0*05. 
This requires 

so that x — (5) 

Consequently /t must lie within the range extending from 
to 3+8tJ^, which is called the 96 % confidence range for {i corre- 
sponding to the given sample; and the boimding values of this range 
are the corresponding confidence limits, or fid/adal limits, for the 
mean of the population. Similarly we have confidence ranges and 
limits corresponding to other levels of significance. In each case the 
appropriate value of ti is found from the table of t. 

Bxampla 1. the 96% fiducial limits for /toorresponding to the sample 

in the example of § 86. 

Here >* = 8, = 2'Sl, s = 2'1€, n ss 9. Hence 

= (2-12) (2-31)/3 = 1-63. 

The required limits are therefore 68 ± 1*63, that is 66*37 and 69*63 in. 

ExampU 2. Find the 98 % fiducial limits for p corresponding to the same 
sample. 

From the table we find that ti = 2*90 is the value of t which is exceeded 
numerically with a probability of 2 %. Then 

= (2*12)(2*90)/3 = 2*05, 

and the required limits are 68 ± 2*06, that is 66*96 and 70*06 in. 

88. Comparlsmi of the means of two samples 
Given two independent samples of and n 2 members, with means 
and Eg respectively, we may use the t distribution to decide 
whether tim means differ significantly, ot whetiier the two samples 
may be regarded as drawn from the same normal population.* We 
test the hypothesis that they are from the same normal population. 


• Cf. Fisher, 1926, 1. pp. 90-3. 
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Let «<(»■■ 1, . r., fii) be the values of the variable in the first sam|de, 
and (j 1, those in the second. Then the sums of squares 
for the two samples are 

* i . 

respectively. Also, if or* is the varianoe of the population, n^SJ/ir* 
and n^S^a* are distributed like x* with %— 1 and 1 i>.F. 
respectively. Consequently (%/S{+n|iS|)/(r* is a with v os., 
where 

V = ni+»,— 2. 

An unbiased estimate of the population variance, obtained from the 
samples, is 

«* = (niSf +ng5|)/j', 

since 

E{y^) = +»8/S|) = (»j— l)(r* + («g— l)<r* = w*, 

so that = <r*. 

Now, in virtue of the hypothesis, and Sg are normally dis- 
tributed about the population mean with variances (r*/nj and 
<r*/ng respectively. Therefore, since the samples are independent, the 
difference «g is normally distributed about zero with varianoe 
<r*(l/ni-h 1/ng). If then we define a statistic t by the equation 


we have 


. x,-x, 

!/»»)’ 

*1 - (^1 - xt)^l<r*{l/ni + 1/ng) 

V (ni(SJ+»g/S|)/(r* 


(«) 


The numerator and denominator of the second member are dis- 
tributed independently like x* with 1 and v d.f. respectively. Hence, 
by the Theorem of § 85, the statistic f conforms to the t distribution 
for V D.F. It is the quotient of the normal deviate 3i— Sg by the 
estimate of its s.b. derived from the samples. If the value of t 
obtained frrom the samples is significant, our hypothesis is dis- 
credited. 

We might vary the above a^ument to test the hypothesis that 
the aantpliw were drawn frtim differmit normal populations, with 
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moatis /t and */(' respectively, but tin same vofiatice. In this case 
Sj— /( and Xf — n' are normally distributed about zero with variances 
(r*/ni and o’*/n| respectively. Hence their difference 

is normally distributed about zero with variance (r*(l/ni+l/nj). 
This difference takes the place of Xi—x^in the above argument, so 
that 

«V(V»i+V%) ' ^ 

conforms to the t distribution for v v.v. 


Example 1. Show that the 95 % fiducial limits for the difference of the 
means of the populations are ± ^iS^(l/n| + l/n|), where ti is the value 
of t corresponding to P = 0*05 and p d.f. If zero lies outside the above range 
of we conclude that the difference between the means of the samples is 

significant of a difference between the means of the populations. 

Example 2. The heights of the ten men of a random sample from an 
unknown population gave a mean of 69 in., and a sum of squares of deviations 
from the mean equal to 42 in.* Apply the t test to the hypothesis that this 
sample is from the same population as that of the example in § 86. 

From the data we have 


a;x = 68, 072 = 69, ni5f = 36, n26|=42, ni = 9, n^sslO. 
The estimate a* of the population variance is 

a* = (36 + 42)/17 = 4*59, 

go that a = 2* 14. The value of t from the sample is then 


t 




For = 17 this value is not at all significant. The test therefore provides no 
evidence against the hypothesis. 


89« Significance of an observed correlation 
The distribution and table of t may also be used to test the 
significance of a value r of the correlation coefficient, given by a 
random sample of n pairs of values from a bivariate normal popula- 
tion. The s.B. of r may not be used in the case of small samples, 
since the distribution of r is then far from normal. We have seen 
that, if the variables in the normal population are uncorrelated, 
the value of r* in random samples of n pairs is a 
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variate. Therefore, by Theorem VII of §72, r*/(l-r*) is a 
fizih 2)) variate. Thus, if I is defined by 


r 


V(n-2), 


( 8 ) 


then <*/(n — 2) is a variate, so that t conforms to the 

t distribution forn — 2 d.f. 

To test the significance of the sample value of r, we make the 
assumption that the variables in the population are uncorrelated. 
A simple calculation then gives the value of t corresponding to the 
sample; and the table of t then tells whether the value obtained is a 
rare one. If it is, our assumption is discredited, and we conclude 
that the variables in the population are probably correlated. 


Example 1. Is a correlation coefficient of 0*6 significant, if obtained 
from a random sample of 11 pairs of values from a normal population? 

Here fs=^, ^ = 9, < = ix3-r = 1*73. From the table we find 

that the probability of obtaining a value of i larger than this is greater than 
0*10. Hence there is no reason to suspect the hypothesis of uncorrelated 
variables in the population. The value 0*6 is not significant. 


Example 2. Find the least value of r, in a sample of 27 pairs from a normal 
population, that is significant at the 6 % level. 

Here v sz 25 and, at the 5 % level of significance, t = 2*06. Hence for r to 
be significant we must have 

->2*06, 


6r 


V(l-r*)- 


which requires r*> 0*146 and therefore |r|>0*38. Values of r numerically 
less than 0*38 are not significant at the 6 % level. 


The accompanying short table gives the least values of | r { that 


Table 6. Minimum values of r that are significant at the 5% level 


H 

r 

* 

r 

V 

r 

n 

f* 

^5 

0*811 

11 

0*653 

18 

0*444 

46 

0*288 


0*765 

12 


19 

0*433 

50 

0*273 

6 

0*707 

13 



0*423 



7 

0*666 

14 

0*497 

25 

0*381 

70 

0*232 

8 

0*632 

15 

0-482 


0*340 

80 

0*217 

9 

0*602 

16 

0*468 

36 

0*326 

90 

0-205 

10 

0*676 

17 

0*466 

40 

0*304 

100 

0*195 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Resmrch Workers, 


X3 


was 
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are significant at the 6 % level, for samples of dilFerent sizes from a 
normal population. These values may be calculated as in Ez. 2 
above. The niunber y of degrees of freedom is » — 2, where n is the 
number of pairs of values in the sample. 


90. Significance of an observed regression coefficient 

The significance of an observed value, b, of the linear regression 
coefficient of y on x, in a random sample from a normal population, 
may also be tested by the t distribution and table. It was shown in 
§ 83 (Theorem V) that, for random samples of n pairs from an un- 
correlated normal population, b*<rllol is a variate. 

Consequently the statistic t defined by 

t = boTi^^n— l)/o‘2» (9) 

conforms to the t distribution for n — 1 d.f. But, since 0*1 and er, 

are usually unknown, this relation is not of much use in providing 
a test of significance for 6. However, a similar result is free from this 
objection. For, from the relation 

6 = rSJS^, (10) 


in which Sf and are the variances of x and y in the sample, it 
follows that nr^Sl/crl 


where as usual denotes the estimate of found from the regression 
equation. Now, by §82, the numerator and denominator of the 
second member are distributed like with 1 and n — 2 d.f. respec- 
tively. The quotient is therefore a variate, and the 

statistic t defined by 


t = 6V{(»-2)S(*,-x)*/S(2/i-m (11) 

conforms to the t distribution for n— 2 d.f., and may be used to 
test the significance of the value of 6 found from the sample. More 
generally Fisher* has shown that, if the variables in the population 
are correlated, with /3 as coefficient of regression of y on x, the 
statistic which conforms to the t distribution is obtained from (11) 
on replacing 6 by (b—fi). 


• Fisher, 1922, 2, p. 609. 
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90, 91] Range of a Sample 

91. Distribution of the range of a sample. 

The range, ti?, of a sample of values is the difference between 
the highest and lowest values. This concept is widely used in 
the statistical control of quality in mass production by an in- 
dustrial plant. Though the range does not conform to the ^-distri- 
bution, it is convenient here to consider the sampling distribution 
of w for samples of n values from a continuous population, whose 
relative frequency density isf{x) in the interval (a, 6). Consider 
the infinitesimal intervals (u, u + du) and (t;, v + dv) within the 
range of variation of a:, and such that u<v. Then the probability 
that, at any drawing, the value chosen will lie in the first of these 
is f(u) du, and in the second f{v) dv. The probability that it will 
lie in the intervening interval is 

f(x)dx. (i) 

J u+du 

Consequently the probability that, in the drawing of n values, 
one will lie in the interval du, one in the interval dv, and the 
remaining n — 2 in the intervening interval is 

dP-n{n- l)f{u)f{v) dudv, (ii) 

since n(n — 1) is the number of ways in which one of the n values 
may fall in each of the intervals du and dv. In other words, dP 
is the probability that the lowest value in the sample will fall in 
the interval du, and highest in the interval dv. 

We may express this in terms of u and w instead of u and v. 
Since w^v--u the Jacobian of u, w with respect to u, v is unity, 
and the probability that the lowest value will fall in the interval 
du, and w in the interval dw, is therefore 

/ru+u> \n-2 

(?P'«=n(n-l)/(«)/{w+«>)M S{x)dx\ ihidto (iii) 

to within infinitesimals of this order. Summing for all intervals 
du consistent with a range to, we find the probability that to will 
fall in the interval dw, irrespective of the value of u, as 

dp«n(n-l)U** '”/(«)/(» + «>) /(*)*:) du^dw. (12) 


I VS 
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This is the required probability differential of u>. Denoting the 
coefficient of dw by ^(w) we have the expected value of tc as 

rb-a 

E{w)=s\ w<^{u))dt0. 

This expected value has been calculated by numerical integration 
for the case of a normal population of s.d. or, and for various 
values of n. Among the results found are 

n: 2 3 4 5 6 10 60 

E{w)l<r: M 3 1-69 2-08 2*33 2-63 3-08 4-50 

Example 1. If the variable in the population has a uniform distribu- 
tion from 0 to h, then J(x) = 1/6. Show that the probability differential 
of the range is n(n— 1) (6— 1£7) dw, and that the expected value of 

fo is (n— 1) 6/(n+ 1). 

Example 2. Show that, in taking a random sample of n numbers 
between zero and unity, the probability that the range will exceed 0*5 
is 1 — (n+ 1)/2"« Show also that 8 is the least value of n for which this 
probability exceeds 0*95. 


Distribution of the Vabianob Ratio 

92. Ratio of independent estimates of the population variance 

As in § 88, let us consider two independent random samples whose 
values are (<*1, and (j = l, with means 

andS^rei^ctively . These provide estimates sf and of the variances 
of the populations, given by 

^ nj-l ** nj-l 

corresponding to and d.f. respectively, where 

PjssTij— i; 

We wish to consider whether two such estimates are significantly 
different, or whether the samples may be regarded as drawn from 
the same normal population of variance cr*. 
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An Appropriate test is furnished by the samplinj^ distribution of 
the ratio F of two such estuuates of <r* obtained from independent 
samples from the same normal population. Thus if 


F = — =1 
«1 


(18) 


then 


VjF _ Hi S{ja* 


Now the numerator sind denominator of the second member are 
distributed independently like ^ with and D.r. respectively. 
Hence the quotient is a variate, so that ViFjvg con- 

forms to the distribution of §72, with I = and m => •Jvj. Con- 
sequently the probability that the value of F will fall in the interval 
dF is 


dP 


iVi) (Vi F + 


(14) 


This is the required distribution of the variance ratio for Vi and D.r. 
It will be observed that the distribution is independent of the 
variance cr* of the population. 



Since v^Fjv^ is a variate its modal value, by § 72 (28), 

is (l>'i ~ + !)• Consequently the modal value of J* is given by 

p ^ ^ yiy»-2vt 

Vi{v^+2) ViV,+2i»i’ 
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and this is always less than unity. Similarly, the mean value of 
ViFjv^, by §72(29), is — 1); and therefore the expected 

value of ^ is 

which is independent of Vi, and is always greater Hum unity. The 
probability curve for F depends, of course, on both Vi and Vt, but 
its main features, for > 4, are those of the curve in Fig. 9. 


93. Fisher's z distribution. Table of F 


A distribution equivalent to ( 14) was first obtained by B. A. Fisher. 
Writing z = ^log^F and therefore F e*^ in the above result, we 
deduce immediately that 


dP 




(15) 


This is Fisher’s z distribution. The probability that a specified value 
of z will be exceeded in random sampling depends upon and >' 2 . 
Fisher published tables* giving the values of z that will be exceeded 
with probabilities 0-05 and 0-01 respectively, corresponding to 
specified values of Vi and i' 2 > From these G. W. Snedecor prepared 
a tablet for the variance ratio, which he denoted by F in honour of 
Fisher. Extracts from this table are here printed by permission of 
Snedecor and the Iowa Press. The ratio, F, tabulated is that of the 
larger estimates of variance to the smaller. The number of degrees 
of freedom corresponding to the larger estimate determines the 
column in the table, while Vg determines the row. At the inter- 
section of the row and the column are given two values of F. The 
upper is the value that will be exceeded with a probability 0*06, 
and the lower with a probability O'Ol. These are often referred to 
as the 6 and 1 % ‘points’ of F. The latter is, of course, always the 
larger. The hypothesis to be tested is that the samples are from the 
same normal population, or from normal populations of equal 
variance. A value of F less than the 6 % point is not significant. 
A value between the 6 and 1 % points is significant at the formra 

* StatMcal Methods for Research Workers, 192S. 

t Snedecor, 1934, 2 or 1938, 3, pp. 184-7. 
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Table of F 


Tabus 6. The variance ratio. 5 and 1 % 'points* of F 

Vi is the number of degrees of freedom for the greater 
estimate of variance, and for the smaller 
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level, bnt not at the latter. A value greater than the 1 % point is 
regarded as highly significant. A significant value of F throws doubt 
on the truth of the hypothesis. This test will be used extensively in 
^ next chapter. It is illustrated by the following numerical 
example. 

JRnimpZe. Apply the test to the eamples of heights given in the ezampleB 
ofH86a4id88(Ez. 2). 

The two estimstee of the population variance famished by the samples 
are 36/8 and 42/0, ooiiesponding to 8 and 9 d.t. respectively. The second is 
the larger, so that 

This value of F is much below the 5 % point as indicated by the table. It is 
therefore not at all significant, so that the samples may very well be regarded 
as drawn from the same population. 


FiSEBB’S TbANSVOBMATION of TEX 
COBBELATION COBFFIOIBET 


94. Distribution of f. Fisher's transformation 
The distribution of the correlation coefficient r, in random 
samples of n pairs of values from a bivariate normal population in 
which tile correlation is p, was given by Fisher* in 1915. He showed 
that the probability that the correlation coefficient will have a 
value in the interval dr is 


dp 


^ d"-« / arccos(-rp) \ 

w(n— 3)! ' ' d(rp)®“*\ ,^(1— / 


dr. 


This distribution is far from normal, with a probability curve which 
is very skew in the neighbourhood of pa ± l,evenfor large samples. 
The use of the 8.e. of r is therefore not to be recommended. In a 
subsequent paper! Fisher showed that the transformation 


* - iloge^. £ " (16) 

defines a variate t, whose distribution is approximately normal with 
mean C variance l/(n— 3), tending rapidly to normality as the 
size of the sample increases. Thus the s.b. of z is independent of the 


t Fisher, 1021, 1. 


* Fisher, 1918, 2. 
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valuo of f. For the proofs of these properties we must refer the 
reeder to Fisher’s papers^ but the distribution of r in the case of an 
uncorrelated normal population was considered in §82, and the 
corresponding distribution of z will be discussed in Ex. 1 below. 

By means of the statistic z we may test whether an observed 
correlation coefficient differs significantly from some theoretical 
value, or from some value given in advance; or whether the values 
of f obtained from two samples differ significantly. From the 
values of r and p we determine those of z and ^ by (16); and it is 
then easy to decide whether the deviation z — ^ is significant for a 
normal distribution of variance l/(n— 3). To obviate the necessity 
of calculating z in every case, Fisher published a table setting out 
the values of r, which correspond to specified values of z ranging 
from 0 to 3 at intervals of 0*01. Extracts from this table are printed 
herewith. 

Table 7 . Fisher's imnsfarmcUion of r 
Values of r for specified values of s at intervals of 0*02 


I 

0-00 

0-02 

0-04 

0-06 

0-08 

00 

0-000 

0-020 

0-040 

0-060 

0-080 

01 

0-100 

0-119 

0-139 

0-169 

0-178 

0*2 

0-197 

0-217 

0-236 

0-254 

0-273 

0-3 

0-291 

0-310 

0-328 

0-345 

0-363 

. 0*4 

0-380 

0-397 

0-414 

0-430 

0-446 

0-5 

0-462 

0-478 

0-493 

0-608 

0-523 

0-6 

0-637 

0-631 

0-665 

0-678 

0-692 

0-7 

0-604 

0-617 

0-629 

0-041 

0-663 

0*8 

0-664 

0-676 

0-686 

0-696 

0-706 

0*9 

0-716 

0-726 

0-736 

0-744 

0-763 

10 

0-762 

0-770 

0-778 

0-786 

0-793 

11 

0-801 

0-808 

0-814 

0-821 

0-828 

1'2 

0-834 

0-840 

0-846 

0-851 

0-857 

1-3 

0-862 

0-867 

0-872 

0-876 

0-881 

1-4 

0-886 

0-890 

0-894 

0 - 89 S 

0-902 

1-6 

0-905 

0-909 

0-912 

0-916 

0-919 

1-6 

0-922 

0-926 

0-928 

0-930 

0-933 

1-7 

0-936 

0-938 

0-940 

0-943 

0-946 

1-8 

0-947 

0-949 

0-961 

0-963 

0-965 

1*9 

0-966 

0-968 

0-960 

0-961 

0-963 


Reproduced by permission of the author, Professor B. A. Fisher, from his book 
on SkUisticdl Methods for Research Workers, 


For the case S=^0, that is to say for testing whether an observed 
value of r indicates any correlation in the population, the method 
of § 89 is preferable. 
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Example 1. For samples from an unoorrelated normal population, we 
know that the distribution of r is 

Fisher’s transformation (16) may be expressed 

r = tanhs, 

so that dr = Bech^zde, 

and the distribution of 2 is therefore 

dp = Bech^^*zdzlB(i, i(n — 2)). 

Now seeh z is approximately equal to exp ( — iz*), so that 
dpaz exp( — — 2 ) 2 *)d 2 

approximately. Consequently the distribution of z is approximately normal, 
with variance l/(n — 2). As Fisher has shown, however, a better approxima- 
tion to the variance of z is l/(n — 3). 

Example 2. In a random sample of 28 pairs of values from a bivariate 
normal population, the correlation was found to be 0*7. Is this value con- 
sistent with the assumption that the correlation in the population is 0*57 
Here r = 0*7, p s 0*5 and n =s 28. From the table we find z = 0*87 and 
C = 0-65. BO that *-5 = 0-32. 

The s.E. of z is so that 

(2-S)/(8.E.) = 1*6. 

Since 2 — ^ is considerably less than twice the s.E., its value is not significant. 
So far as this test goes, the correlation in the population might very well 
be 0*5. 

The 95% fiducial limits for p are found in the usual manner (cf. §51). 
The value of ^ must be such that 

|2-g|<l*96(s.E.) = 0*392. 

Consequently 0*87 — 0*392 < 0*87 -f 0*392 

0*48 <C< 1*26, 

and therefore from the table 

0*446 <p<0*851. 

The 95 % fiducial limits for p are therefore 0*45 and 0*85 approximately. 


95. Comparison of correlations in independent samples 
Next suppose that two independent samples of and pairs 
give correlation coefficients of and respectively. May they be 
regarded as drawn from the same population; or is the difference 
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between and r, significant? On the assumption that the samples 
are from the same normal population, the difierence between the 
values of z for the two samples is normally distributed with s.B. 

1 1 

The two values of z are 

*i = ilogr— T* **=ilogj— 

1-^1 l-ra 

If I Zi — Zg I is less than 26, the difierence is not significant at the 
5 % level; and the assumption that the samples are from the same 
population, or from equally correlated normal populations, is not 
discredited. 

Example, The first of two samples consists of 23 pairs, and gives a correla- 
tion of 0*5; while the second, of 28 pairs, has a correlation of 0*8. Are these 
values signifjcajitly different ? 

On the hypothesis that they are from the same normal population, the 
S.E. of the difference of the 2*s is 

• = = ->/0*09 = 0*3. 

From the table we find that Zi = 0*55 and z, = 1*10, so that 

I = = l-83e, 

which is a little less than 26, and is therefore not quite significant at the 5 % 
level. The hypothesis is not discredited. 

If in the above example the value of is not given, we may find the limits 
between which it must lie in order that 1 9*1 — r, | should not be signifiocunt at 
the 5 % level. For the condition to be satisfied is 

I I < = 0*688. 

Consequently 1«10 — 0*688 <Zj< 1*10 + 0*588 

0*612 <Zi< 1*688, 

so that 0*47 <ri< 0*93. 

96. Combination of estimates of a correlation coefficient’*' 
Suppose that k samples of pairs of values (i = 1, A;) yidd 
correlation coeflScients r^. We may wish to enquire whether the 
samples may be regarded as drawn from the same normal population 



« Cf. Yates, 1934, 6. 



204 Tesis of Significance [x 

(or equally correlated ones); and, if they may be so regarded, to 
obtain a combined estimate of the population value p. To test 
homogeneity of the estimates , Tre make the assumption that the 
samples are from equally correlated populations. Then, by means 
of Fisher’s transformation (16), we obtain values Sf of variates 
which are approximately normally distributed about a common 
mean, with variances l/(nt—3). The estimate of their common 
mean, which has ntiniTnnin variance, is obtained by weighting 
the values 2 ^ inversely* as their variances. This estimate z is therefore 


S(^<-3)g< 
S(»i-3) • 


(17) 


Then, sinoe the variates z^ are approximately normally distributed 
about with variances 1/(71^ — 3), the sum ]S(n{— 3) (z^—z)* is dis- 
tributed approximately as with ib — 1 n.7., the mean z having 
been determined from the data.f The significance of the calculated 
value of this quantity may be ascertained from the table of 
We may express the above sum in a form more convenient for 
numerical calculation. Thus 


S(»,--3)(z,-z)* = S(»«-3)*?-2*S(»<-3) 

- 2(»<-3)*?-[S(»*-3)*,]*/S(»i-3). 

If the calculated value of this expression is not significant as a 
value of with Jb — 1 i>.v., the estimates of the correlation in the 
population may be regarded as homogeneous. In that case tiie 
value of z given by (17) is an estimate of the true value, C, corre- 
sponding to the population coefficient p. The required estimate of 
p is then given by 

p » tanhz, 

and its value may be read off from the table. 

Example. Independent samples ,of 21, 30, 39, 20 and 35 pain values 
yielded oonelation ooeffioients 0-39, 0*01, 0-43, 0*54 and 0*48 respeetively. 
May these estimates be regarded as hunogeneousT If so, find an estimate of 
the oonelation in the population 

• Cf.Ex.IV,6. 

t Tor details of this part of tiie proof see Ex. X, 10 below. 
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The corresponding values of m&y be taken from the table, and the 
calculation tabulated as follows: 




3 

{n,-Z)zt 

K-3)*/ 


0*412 

18 

7-416 

3*056 


0*709 

27 

19*143 

13*572 

0*43 


36 


7*618 

0*64 

0*604 

23 

13*892 

8*391 

0*48 

0*524 

32 

16*768 

8*786 

Totals 

— 

136 

73*779 

41-422 


The value of x* from the data is therefore 

X* = 41*422 -(73-779)V136 = 1*4 approximately. 

For 4 D.F. this value is not at all significant, so that the coefiicients may be 
regarded as homogeneous. From (17) we have 

i = 73*779/136 = 0*6426. 

The table then gives the estimate of p for the population cm 0*495 s 0*6 
nearly. 
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EXAMPLES X 

1. A random sample of 16 values from a normal population 
showed a mean of 41-5 in., and a sum of squares of deviations from 
tills mean equal to 136 in.* Show that the assumption of a mean 
of 43*6 in. for the population is not reasonable, and that the 95 % 
fiducial limits for this mean are 39*9 and 43*1 in* 
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Another sample of 20 values from an unknown population has a 
mean of 43*0 in., and a sum of squares of deviations from this mean 
equal to 171 in.* Show that the two samples may be regarded as 
from the same normal population! 

2. Nine patients, to whom a certain drink was administered, 
registered the following increments in blood pressure: 7, 3, — 1, 4, 
— 3, 6, 6, — 4, 1. Show that the data do not indicate that the drink 
was responsible for these increments. 

On the null hypothesis the above values are regarded as a random 
sample from a normal population whose mean is zero. The data 
give x — 2 and s* = 63/4. Hence t = 1*61 with v — 8. This value is 
not significant. 

3. In testing the superiority of Leake’s drill over the ordinary 
drill, plots in the form of long strips were cultivated, two adjacent 
strips being allotted at random to Leake’s drill and the ordinary.* 
For ten such pairs of plots the values of the excess of the weight of 
grain from the plot treated by Leake’s drill over that obtained by 
use of the ordinary drill were 2*4, 1*0, 0*7, 0*0, 1*1, 1*6, 1*1, —0*4, 
0*1 and 0*7. Show that the data furnish strong evidence of the 
superiority of Leake’s drill. 

From the data x = 0*83 and = 6*001, so that 

«» = 6*001/9 = 0*666. 

On the null hypothesis the mean of the population is zero, and 
t = 3*22 for 9 d.f. This value belongs to about the 1 % level of 
significance. There is thus strong evidence against the null hypo- 
thesis. 

4. For a random sample of 10 pigs, fed on diet A, the increases in 
weight in a certain period were 

10, 6, 16, 17, 13, 12, 8, 14, 16, 9 lb. 

For another random sample of 12 pigs, fed on diet B, the increases 
in the same period were 

7, 13, 22, 16, 12, 14, 18, 8, 21, 23, 10, 171b. 


* Data from Wishart, 1934, 3, p. 32. 
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Show that, by the test of § 88, the mean increases of 12 and 16 lb. 
in the two samples are not significantly different. 

[«* » (120+314)/20 » 21-7; t » 1-6, v =■ 20.] 

5 . Show that the estimates of the population variance from the 
samples in Ex. 4 are not significantly different. 

6. Deduce from §82(26), that the statistic r7(n-2)/^(l-r*) 
conforms to the t distribution forn— 2 d.f. 

7. A random sample of 18 pairs from a bivariate normal popula- 
tion showed a correlation coefficient of 0*3. Is this value significant 
of correlation in the population? Prove by the method of §89, 
Ex. 2, that the least value of r significant at the 5 % level is 
about 0*47. 

8. A random sample of 19 pairs from a bivariate normal popula- 
tion showed a correlation of 0*65. Prove that this is consistent with 
the assumption of a correlation of 0*40 in the population. Also 
show that the 95 % fiducial limits for p are 0*28 and 0*85 approxi- 
mately. 

A second sample, of 23 pairs, showed a correlation of 0*40. Prove 
by the method of §95 that the two samples may be regarded as 
from equally correlated populations. 

9. Show that the mean value of the positive square root of a 
Beta variate of the second kind, with parameters I and m, is 

Deduce that the mean value of 1 1 1 for v d.f. is 

r{Uv-i))^vir(iv)yin. 

Also show that, for samples from an uncon elated bivariate normal 
population in which the variances are erf and (rf, the mean value of 
the modulus of the regression coefficient of y on z is 
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10. To proT6 that the sum S (»< - 3) (*< - *)* of § 96 is distributed 

approximately as ^ with £—1 d.f., we may proceed by the method 
of § 77. Denoting the sum by T* we have 

T»-2(n,-S)[(*,-0-^-OT 

•>S(«<-3)(z,-Q*-2(z-OS(»i-3){z<-0 + (2-0‘S(«<-S) 

-SK-3)(z,-e)»-a (i) 

where f i = S («* - 3) («< - 0/VS (»< - 3). 

Now introduce an orthogonal linear transformation 

(i“ 1.2,...,*), 

where Xf = “ 3), 

and the first of the ^’s is identical with above. Then, since the x’s 
are independently and normally distributed about zero with unit 
8.D., so are the ga; and in virtue of (i) 

T*- z:*!-f!= SCI. 

ill i-2 

Consequently T* is distributed like * — 1 d*®’* 

11. Show that, for the t distribution with v d.f., the moment ol 
order r (even) about the origin is, for r < v, 

(r - 1) (r - 3) ... 1 . 2) (v - 4) . . . (V - r). 


. 12. For samples of n from a population with the exponential 
distribution /(«)» e~*, x^O, show that the range conforms to 

^(u>) * (n - 1) e-"(l - e-")»-*, 
and hence that F(M>)*«l + i + i + i + ...H — 

W— 1 



CHAPTER XI 


ANALYSIS OP VARIANCE AND COVARIANCE 

Analysis of Vabianob 

97. Resolution of the *sum of squares’ 

In the words of its author, R. A. Fisher, analysis of variance is 
the ‘separation of the variance ascribable to one group of causes 
from the variance ascribable to other groups It is a procedure by 
which the variation embodied in the data of the sample may be 
resolved into component variations due to independent factors. 
Each of the components yields an estimate of the population vari- 
ance; and these estimates are tested for homogeneity by means of 
the F table. 

Consider a random sample of N values of a normally distributed 
variable x. It is frequently possible to arrange these in classes 
according to a certain factor or criterion. For instance, if the 
variable is the price of a certain commodity, the classes may 
correspond to difiFerent seasons or to difiFerent districts. Or, if the 
variable is the crop yield of a variety of cereal, the classes may 
correspond to difiFerent manurial treatments. Let denote the 
value of the Jth member in the ith class. Thus the first subscript 
indicates the class, and the second the position in that class. Let 
be the number of members in the tth class, x^ the mean value for 
that class, and x the general mean for the whole sample of N values. 


Then 

IS'X — S 2 

i i 

== 0 
* i 

(1) 

and 

n^x^ = 

i 

= 0- 
i 

(2) 


To resolve the sum of squares of the deviations of the N values x^^ 
from the general mean, we may do so first for the members of the 
ith class. Thus, in virtue of § 3 (10), 


S (*« - «)* = S (*« -*<)*+ ni{Xi - *)*, 

i i 

* Fisher, 1938, 2, p. 216. 


>4 
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and on samming this lesolt for all the classes we haTs the required 
resolution of the 'sum of squares' 

(3) 

ii ii t 

This formula holds, of course, whether the population is normal 
or not. 


98. Homogeneous population. One criterion of classification 
Now suppose that the population, from which the random sample 
of N ralues was drawn, is homogeneous with respect to the factor of 
classification, that is to say, that the factor has no effect upon the 
▼aloe of the variate. Then if the population is divided into classes 
according to this factor, the different classes will have the same 
statistical properties. In particular, they will have the same mean 
ji and the same variance <r^, which are the mean and the variance of 
tiie population. Then, from the various sums in (3), we can obtain 
three unbiased estimates of tr^. For, by (7) and (8) of §64, the 
expected value of the first sum in sampling is (N—l)<r^; so that this 
sum, divided by N— 1, gives an unbiased estimate of cr* based on 
N—1 D.T. Similarly, the values in the tth class constitute a 
random sample whose mean is Xf, so that 

and therefore* 


TO S = L K- 1)<^ =» (N-h)<T*, 

ii i 

where h is the number of classes. Thus the second sum in (3), divided 
by N—h, gives an unbiased estimate of <t* based on N— A n.T. 
And, since the expected values of the two members of (3) must be 
equal, that of the final sum* is (A— l)<r*; so that this sum, divided 
by A— 1, gives an unbiased estimate of a* based on A — 1 D.r. The 


identity 


(N-l)<r« = (N-A)<r«+(A-l)«r* 


(4) 


obtained by taking expected values of the various sums in (3), 
shows that degrees of freedom are additive, the number of freedoms 


* See also Ex. XI, 1, below. 
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oortesponding to the total som beii^ equal to the aum of the 
fieedoms corresponding to the partial sums. The above results are 
usually tabulated as follows: 


Source of variation 

DJ. 

Sum of equarea 

Mean square 

Between olaae means 

fc-1 


£n,(8,-S)V(A-l) 

Within olasBee 

N~h 



Total 


< ! 

— 


In the columns headed and 'Sum of squares* the items are 
additive, but not in the last column which gives the estimates of a*. 

The argument so far holds whether the population is normal or 
not. In the case of a homogeneous normal population the results 
follow from the distributions of the various sums in (3). For then 
the first sum divided by a* is distributed like with ^—1 D.F., as 
proved in §77. The mean value of this sum is therefore {N — 1 ) 0 **. 

Similarly, 2 (»«— is a with n* — 1 D.F.; and therefore the 

y 

second sum in (3), divided by a*, is distributed like with 

i 

The mean value of this sum is therefore {N—h)^*. Similarly for 
the final sum in (3). 

In order to teat the homogeneity of the estimates of a* by means 
of the variance ratio and the F table, it is necessary to assume that 
the population is normal; for this test is founded on that assumption. 
The practioe is to compare the estimate ‘ between class means ’ with 
that obtained 'within classes’. For the final sum in (3) represents 
the variation due to the factor of classification, «md the second sum 
in (3) is the residual variation after the former has been removed. 
If the estimate obtained between classes is significantly greater 
than that within classes, we are justified in concluding tiiat tbe 
factor of classification exercises an influence on tbe value of the 
variable. In that case the assumption of homogeneity is discredited, 
and we must regard the population as heterogeneous, If, however, 
thft estimates of a* are not significantly different, tite test provides 
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no evidence against the hypothesis of a homogeneous population. 
It is important to remember that the variance ratio ma^ be tested 
by means of the F table only if the two estimates of variance are 
statistically independent. Since the mean of a random sample from 
a normal population is distributed independently of its variance, 
the two sums in the second member of (3) are independent, and the 
required condition is satisfied.* 

99. Calculation of the sums of squares 
In calculating the above sums of squares it is not necessary to 
find the deviations from the various means. We know that the sum 
of the squares of the deviations of N numbers z, (s = 1, from 

their mean x is, in virtue of § 3 (1 1), 

S(*,— »)* =■ 

$89 

where T is the sum of the numbers. Appling this formula to the 
various sums considered above we have 

( 6 ) 

i i i i 

the grand total T being given by 

S’ = 2 S 

ii 

Similarly, summing first for the values in the ith class, we have 

2 2 = S (Sa^y- r?K), 

i y < y 

where is the sum of the values in the »th class. Coxisequently 

22(*yy-^i)* = SS*?y-S rfK. (6) 

< y i y i 

Subtracting (6) from (6) we find for the third sum of squares 

2»i(2<-*)* = 2 Tllnt-T>IN. (7) 

* See also Fisher, 1026, 1 and Irrrin, 1934, 4. 
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This result may also be obtained directly. For, dnoe is the 
firequeiK^ of the value , 

S »<(»< - *)• = S - (S = S 3’?K ~ 2^/^ 

as stated. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by udng (5), (6) and (7) are unaltered 
by a change of origin. In other words, if all the values z^f are 
decreased (or increased) by the same constant, the values obtained 
for the three sums of squares are unchanged. The arithmetic may 
often be simplified in this way, large numbers being replaced by 
much smaller ones. 


Ex. 36 plots, of approximately equal fertility, were sown with 7 different 
varieties of wheat, 5 plots to each variety, the distribution of varieties among 
the plots being random. The following table gives the yields of grain in 
bushels per acre, the 7 columns corresponding to the different varieties. Do 
the data (fictitious) indicate a significant difference in the yields of the 
varieties? 


13 

15 

14 

14 

17 

15 

16 

11 

11 

10 

10 

15 

9 

12 

10 

13 

12 

15 

14 

13 

13 

16 

18 

13 

17 

19 

14 

15 

12 

12 

11 

10 

12 

10 

11 


The classification is according to variety. The number of classes is = 7, 
and the number of items in each class is = 5. Consequently N = 35. The 
arithmetic is simplified by shifting the origin to (say) a; = 12. Diminishing 
all the yields by 12 we may rewrite the table: 


1 3 

-1 -.1 

-2 1 

4 6 

0 0 


2 2 

-2 -2 

0 3 

1 6 

-1 -2 


5 3 4 

3 -3 0 

2 1 1 

7 2 3 

0 -2 -1 


from which we have 

= 2, 9, 0, 6, 17, 1, 7; T = 42; 


Xi = 0-4, 1-8, 0, 1*2, 3-4, 0-2, 1-4. 


SS4= 266, r*/J!^ = 50-4, 

i i i 


Hence 
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'Dis thiee turns of iquaiw ore thetefore, l)y (6), (6) and (7), 

s 266- 60-4 = 216-6, 

< i 

266 - 92 b174, 
i / 

2n((3,-3)« s 92 - 60-4 a 41-6. 
f 

In tabular foimi 


Source of 
variation 

DJ. 

Sum of 
squares 

Mean 

square 


Between varieties 

6 

41*6 


■QUI 

Within varieties 

28 

174 



Total 

34 

215*6 


BbB 


For 1*1 = OandViss 28tbevalue 1-1 of F is not significant. Since the estimatee 
of variance between varieties and within varieties are not significantly 
different, the experiment as a whole does not indicate significant variation 
in the yields of varieties. 

100. Two criteria of classification 

Consider next the case in which the N values of the data may 
be dassified according to two different criteria, A and B. For 
simplicity suppose that A determines h different classes, and B 
determines k different groups; also that the kk values of the variable 
are such that, in each of the h classes there is one value from each 
group, and in each of the k groups one value from eadi class. For 
the purpose of calculation the hk values may be arranged in a 
rectangular array of h columns and k rows, the columns corre- 
iponding to classification A and the rows to B. The double suffix 
notation will indicate that belongs to the tth class and the j*th 
group; and, in the rectangular array, this value occurs in the tth 
column and the jth row. As before Se denotes the general mean; 
Sf is the mean of the values in the till class, and Sf the mean of those 
in the Jih. group. 

The argummit leading to (3) is valid here also, tiie resolutipn 
expressed by that equation being with respect to the means of 
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columns. We may use (3) again to resolve the sum SS 

this time, however, with respect to the means of rows. In doing so 
we take as a new variable 

each value of the original variable being diminished by the mean of 
the column in which it lies. The values X^j may be arranged in h 
columns and h rows corresponding to those of Xij. Now the general 
mean of X^j is zero, since the means of x^} and Xi are each x. Thus 
Z 0. Similarly the mean of the values X^f in the^'th row is 

Xf = Xf — X. 

Accordingly, on applying (3) to the quantities but taking row 
means instead of column means, we have 

2 s = 2 + 2 2 - ^i)\ 

ii i ii 

or its equivalent 

2 S (»</ - 2<)* = 2 Kxf - x)* + 2 s (x„ - X, - X, + x)*. (8) 

1 i i i i 

Substituting this value in (3), and remembering that the numbers 
are each equal to k, we have the resolution expressed by 

2 2 («« - *)* = 2 Kxi - X)* + 2 Kxf - X)* + 2 2 Vi, (9) 

ii i i i i 

where 

As in the preceding section, the expected value of the first member 
of (9) is (AJfc — l)flra Similarly, on the assumption of a homogeneous 
population, the expected values of the first two sums in the second 
member are (A — l)cr* and (A — l)(r* respectively. That of the final 
sum is therefore (AA— A — A+ Thus the various sums in (9), 
divided by (AA-1), (A-1), (A-1) and (A-1)(A-1) respectively, 
give unbiased estimates of the population variance based on degrees 
of freedom represented by these divisors. When the population is 
normal the four sums in (9), divided by (r\ are distributed like 
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with degrees of fireedom as stated above; and the mean values of 
the various sums follow from this. The above results are usually 
tabulated: 


Variation 

D.T. 

Sum of squares 

Mean square 

Between Glasses 

^-1 

< 

The quoti«it of * sum 
of squares* by d.». 
in each case 

Between groups 

i-1 


Error 

(A-l)(fc-l) 

ssrb 
< 1 

Total 

hh--\ 


— 


Degrees of freedom are additive, as well as sums of squares. The 
variation corresponding to the factors of classification is repre- 
sented by the first two sums in the second member of (9). The un- 
controlled variation represented by the residual sum £ Yh ^ 

i i 

due to a variety of causes, which are grouped under the term 
‘error*. 

To test the hypothesis of homogeneity in the population, we 
compare the two estimates of variance obtained between classes and 
between groups with that obtained from error. If the factor corre- 
sponding to either classification has a significant effect upon the 
value of the variable, this will appear in the corresponding mean 
square. In order to test the variance ratio by means of the F table* 
the population must be assumed normal. If either of the first two 
estimates of variance is significantly different from that obtained 
from error, the hypothesis of homogeneity is discredited. The 
significance of the difference of the means of any two classes, or 
any two groups, may be tested by means of the t table, as in the 
example below. 

Example, An agricultural experiment was conducted to test the effects 
of change of soil (6 blocks) and variety of wheat (7 different strains) on the 
yield of grain. Each block wets divided into seven plots, and the plots of each 
block were assigned at random to the seven varieties. The yields, in bushels 
per acre, are set out in the same rectangular array as in the example of 
§ 99, columns corresponding to varieties cuid rows to blocks. Discuss the 
significanoe of the variation of yield with the two factors. 

* The independence of these estimates of variance may be established 
by Cochran's method. See 1934, 6. 
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With origin atx^ 12 the values of Xf, the total sum of squares and 
the sum of squares corresponding to varieties aie as already foimd. Show 
similarly that 

T, = 20, -0, 6, 28, -6 

X, = 2-86, -0*86, 0*86, 4, -0*86 

S Till = 184-6, S Till - rV35 = 134-2. 

The tabulation of results is: 


Source of 
variation 

D.V. 

Sum of 
squares 

Mean 

square 

F 

Varieties 

6 

41-6 

6-93 

4-2** 

Blocks 

4 

134-2 

33-55 

20-2** 

Error 

24 

39-8 

1-66 

— 

Total 

34 1 

215-6 

-- 

— 


The double asterisk indicates that the value of P is significant at the 1 % level 
(a single asterisk denoting signiBcance at the 5 % level). Thus the yields of 
the varieties are significantly different; and the experiment indicates a very 
marked variation in soil fertility from one block to another. 

We may use the t test to examine more closely the difference between the 
mean yields of any two varieties, say the second and third. The difference is 
sTi — ^3 = 1*8. To test the significance of this we use the estimate of variance 
obtained from ‘error’, and corresponding to 24 D.r. This value, 1-66, is the 
estimated variance of the yield of a single plot. Tho estimated variance of 
the mean of 6 plots is thus 1*66/5; and that of the difference of the means of 
two independent samples of 5 plots each is 1*66 x 2/6 = 0*664. The s.3E. of 
the difference of the means of two varieties is therefore 0*815 nearly. The 
value of t for the above two varieties is thus 1*8/0*815 = 2*2. For 24 D.r. 
this is significant at the 5 % level. The least difference, fn, that is significant 
is given by m/0*815 = 2*06, so that m = 1*68 nearly. Show also that the 
S.B. of the difference between the mean yields for two blocks is 0*69, and that 
a difference of 1*4 is significant at the same level. 

101. The Latin square. Three criteria of classification 

In the general case of three criteria of classification, corresponding 
classes respectively, we should require a three-dimensional 
generalization of the rectangular array employed above. In the 
particular case for which A = fc = p this requirement is obviated by 
an arrangement known as the Latin square. As it is customary to 
use the letters A, 5, 0, ... to distinguish the different classes of one 
of the three classifications, we may conveniently explain the Latin 
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square, of ordefr », as an arrangement of the n letters A, B, C , ... 
in the form of a square array, such that each letter occurs once in 
each column and once in each row. Consequently each letter occurs 
n times in a Latin square of order ». The accompanying array is one 
arrangement for a Latin square of order five. 

B D E A C 

CASED 
D C A B E 

E B C D A 

A E D C B 

The triple classification and the Latin square arrangement are 

illustrated by the design of the following agricultural experiment. 
The variable is the yield of grain per acre, and the object of the 
experiment is to test the efiect on yield due to change of manurial 
treatment, and to variation of soil in each of two perpendicular 
directions. A block of land is divided into plots, arranged in n 
parallel rows in one of the given directions &nd » parallel columns in 
the perpendicular direction. The n‘ different treatments are dis- 
tributed at random among the n plots of each row, but in such a 
way that no two plots in the same column are given the same treat- 
ment. We thus have a Latin square in which letters correspond to 
treatments, while rows and columns correspond to soil variation in 
the given perpendicular directions. 

The necessary formulae for calculation are obtained by an easy 
extension of the foregoing results. With the same notation as in the 
preceding sections, the resolution expressed by (9) is still valid. The 
final sum 2 ^ 7}^ may be separated into components by applying 

(3) to the variable Tfj and resolving, not with respect to the means 
of rows or columns, but with respect to the means of letters. Let 
7 denote the general mean of the values and Yi the mean of the 
values for the Ith letter. Then, since the mean of each of the four 
terms in Ttf has the same magnitude z, we have 7 o. To calculate 
Yi we consider the four terms of Ytj separately. The mean value of 
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corresponding to the Zth letter we shall denote by 2^. The mean 
of the values for the 1th letter is x, sinoe each column contains the 
Zth letter once only. Similarly the mean of the values for th e Ith 
letter is The last term of is constant, and we have the result 

Sf^X'^X^X ** X^’^Xa 

Thus, on applying (3) to the values and the letters, we deduce 
S S (F«- F)« - n 2 (F,- 7)*+ s S 

i i I ti 

or its equiralont 

22 " »2(*j—*)*+SS(*«— 25|+2x)*. 

< i I ii 

Substituting this value in (9), and putting h<Bk — n, we obtain 
the required resolution 

2 S (««-*)• = » 2 (*«-*)*+» 2 («y-*)*+n2 (^/-«)*+ 2 S Z\„ 

ii i i t t i 

( 10 ) 

where =* ®i+ 2®. 

The expected value of the first member of (10) is (n*— l)(r*, and 
on the assumption of a homogeneous population, that of each of the 
first three sums on the right is (n— 1 ) 0 *, as proved in § 98. Con- 
sequently 

-B (S S Zlf) » [»*- 1 -3(n- l)]<r* = (»-!) {n^2)a*. 

Thus each of the first three sums in the second member of (10), 
divided by (n— 1), gives an unbiased estimate of er* based on 
(»— 1) D.?.; and the final sum, divided by (n— l)(n--2), gives an 
unbiased estimate based on that number of freedoms. In the case 
of a normal population the various sums in (10), divided by <r*, are 
distributed like with d.f. equal to those of the corresponding 
estimates Of c*. The results may be tabulated as shown below. 
The first three mean squares are compared with that obtained ficmn 
error in the usual manner. The significance of the difference of the 
means of two dasses may be tested by the t table as illustrated 
earlid^. 
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The various &ums of squares may be calculated from the usual 
formulae, putting N and h^h^n. Thus 

i i i i 

Similarly, ^ S (^/ - = S 

i I 

where Tt is the sum of the values for the Ith letter; and so on. By 
subtraction we have therefore 

S S = S - (2 + 2 + S Tf)ln + 2T^ln^. 

i i i i i 5 I 

It is usual to find this sum by subtraction after the other sums 
have been calculated. 


Source of 
variation 

n.F. 

Sum of 
squares 

Mean square 

Columns 

n— 1 

nE (*<-«)• 

i 


Rows 

n-1 

*»£(*,—*)* 

The quotient of *sum 
of squares* by x>.v. in 

Treatments 

n-1 

n:S(S|-S)* 

each ease 

Error 

(n-1) (n-2) 

i 1 


Total 

n*-l 

SS(x«-S)* 

— 


Example. An agricultural experiment was conducted on the Latin square 
plan to test the effect on yield due to change of treatment (6 kinds) and also 
to variation of soil in each of two perpendicular directions. The results are 
set out in the Latin square below (n = 5), in which letters correspond to 
treatments, while rows and columns correspond to the two perpendicular 
directions. Are the effects on yield significant? 


A 

7-4 

D 

8*9 

E 

6-8 

B 

12*0 

C 

14*3 

C 

IIS 

B 

6-5 

A 

8-7 

E 

7*0 

D 

7*9 

D 

101 

0 

17-9 

B 

90 

A 

8*5 

E 

7*1 

E 

8*8 

A 

101 

C 

16-7 

D IM 

B 

7-4 

B 

11*8 

E 

8-8 

D 

14*3 

C 

18*4 

A 

10*1 


Shifting the origin to a; = 10 (i.e. i^ucing each of the above yields by 10) 
the reader may easily verify that 


= -01, 

2-2, 

3*6, 

7-6, 

-3*2. 

= -1*6, 

-7-6, 

2-6, 

81, 

13*4; 

= -6-2, 

1 

09 

28*1, 

2-3. 

-11*9. 
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Consequently T^/N = 10 x 10/25 = 4*00. 

The various sums of squares are given by 


= 285*18 -.4-00 = 281*18 

Srj/n-TV-?^= 17*02-4*00= 13*02 
i 

ST?/n-T*/^= 50*95-4*00= 46*95 

S r?/n- T^IN = 194*89-4*00 = 190*89 
1 

Bemainder = 30*32 

The tabulation is : 


(Total) 

(Columns) 

(Rows) 

(Treatments) 

(Error) 


Source of 
variation 

D.P. 

Sum of 
squares 

Mean 

square 

F 

Columns 

4 

13*02 

3*265 

1*3 

Rows 

4 

46*95 

11*737 

4*6* 

Treatments 

4 

100*89 

47*722 

18*9** 

Error 

12 

30*32 

2*627 

— 

Total 

24 

281*18 

— 

— 


For 1^1 = 4 and v, = 12 the 5 and 1 % values of F are 3*26 and 5*41. Thus 
the variation in rows is significant, and that due to treatments is highly 
significant. 

Show that the s.B. of the difference of the moans of two classes is 1*005, and 
hence that a difference of means less than 2*2 is not significant at the 5 % level. 
Thus the yield due to treatment C is significantly greater than that due to 
any other treatment; and the yield due to D is significantly greater than 
that due to E, 

102. Significance of an observed correlation ratio 

We shall next consider some applications of analysis of variancef 
to testing the significance of an observed correlation ratio, coefficient 
or index, and to testing the linearity of a regression. First suppose 
that a value, of the correlation ratio of ^ on is obtained from a 
random sample of N pairs of values from a bivariate population 
in which y is normally distributed. We wish to test whether this 
value is significant of an association between the two variables in 
the population. Let the N pairs of values of the variables be 
arranged in arrays as in Chapter v, so that the y^B are classified 

t According to the definition given by Wishart and Sanders (see § 105, 
below) these applications should be classified imder 'Analysis of Covariance *• 
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ooooxding to thd oorresponding rslues of x. llien sinoe the subeoript 
t indicates the array, andj the position in that array, we hsp^e as in (3) 

Spy«-ir)* - (ii) 

being the mean value of y in the tth array, and the number of 
values in that array. In virtue of § 34 this equation corresponds to 
the identity ^ ^ ^j2) 

being the variance of y in the sample. 

On the assumption that there is no association between the two 
variables in the population, the y’s in each array may be regarded 
as a random sample from the population of y*s. The various sums 
in (11), divided by the variance o’* of y in the population, are then 
distributed like with {N — 1), (N—h) and {h— 1) D.F., h being the 
number of arrays. The problem is therefore the same as in §98. 
On taking expected values of both members of (11) we have 

(jy-l)o’* = (^-A)o’»+(A-l)o’*. (13) 

The various sums in (11), divided by the corresponding coefSdents 
in (13), yield unbiased estimates of a*. The tabulation is: 


8ouroe of variation 

flag 

Sum of squares 

Mean square 

Between arrays 
Within arrays 



NShi*/{h-l) 

NSAl~i>mN-h) 

Total 

N^l 

N8l 

— 


To test whether the mean square between arrays is significantly 
greater than that within arrays we have 


F 


«* N-h 


with 1^1 — h— 1 , V, = iV -A. 

ExampU. A random sample of 79 pairs, arranged in 7 arrays of y'%, gave 
a correlation ratio of y on a; equal tq 0*4. Is this value significantt 


Here 


F 


Vl=ae, 

0-16 72 
0-84 ** 6 


y» « 72, 



2-29*. 


Since the 8 % value of F is 2-23, the above value k significant at that leveL 
We ocmolade that there is assooiation between the variables in the popolstioiu 
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103. Significance of a regression function 
To testa the signifiosnoe of a regression function is to examine 
whether the data of the sample indicate any degree of 6Msooiation 
of the variables, of the type represented by the regression equation. 
We shall see that, in the case of a linear regression equation, this 
amounts to testing the significance of an observed value of r; while, 
in the case of an equation of curvilinear regression, it is equivalent 
to testing the significance of an observed value of the correlation 
index B. 

Consider first a linear regression equation for a random sample 
of N pairs of values from a bivariate normal population. If is the 
estimate of given by the regression equation, we know by § 27 (31) 

i i i 

The first sum of squares in the second member is due to deviations 
from the regression function, and the second sum to the regression 
function itself. In virtue of § 27 the various sums in (14) are equal 
to the corresponding terms in the identity 

iVSl = iVS|(l-r*)+i^r*£?. 

On the assumption of an uncorrelated normal population these sums, 
divided by <r*, are distributed like with N—l,N—2 and 1 D.r. 
respectively, as proved in § 82. The expected values of the various 
sums in (14) are therefore the corresponding terms of the identity 

(i^-l)<7-a = (iV^-2)<r*+<r*, 

and each sum thus gives an unbiased estimate of (t*, on division by 
the appropriate number of d. 7. The tabulated results ore: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Regression function 

1 



Deviation from the 
regression function 


» i 


Total 

^-1 

S:S(y«-F)* 

— 


If the mean square due to the regression function is significantly 
greater than that due to deviations from this function, we conclude 
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that there is a itol association between the variables, of the type 
indicated by the regression equation. To test the significance we 
have for the variance ratio 

F = (iV-2)rV(l -r*); = 1, = N-2. 

The method is thus equivalent to testing the significance of r; and 
it will be noted that the above statistic F is the square of the statistic 
t of § 89 (8). The two tests are therefore equivalent. 

Example, A correlation of 0*5 is obtained from a random sample of 26 
pairs from a normal population. Is this value oignifioant? 

Here f = i, = 26, -P = 8**, = 1, v, = 24. 

From the table we see that this value of F is highly significant of correlation 
in the population. 

Obtain the same result by use of the i table. 

In the case of a correlation index R associated with a curved 
regression line, the formulae of § 41 show that the above argument 
holds if r* is replaced by iJ®, except that the numbers of d.x. must be 
altered. If £ is the number of statistics that must be calculated from 
the sample to obtain the regression equation, the number of d.f. 
associated with deviations from the regression function is AT — fc, 
and the number associated with the regression function itself is 
A; — 1. For testing the significance of the regression function we have 
therefore 

104. Test for non-linearity of regression 
Considering the random sample of N pairs of values from a bi- 
variate normal population, let us return to equation (11). In virtue 
of § 36 (18), the final sum in that equation may be further resolved as 

Sn<(y<-y)* = (15) 

fit 

the various sums being equal to the corresponding terms of the 
identity 
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The three sums in (16), divided by a*, are distributed like ^ with 
h~l,h—2 and 1 d.f. respectively; and on taking expected values 
of both members of (15) we obtain the identity 

(A-l)(r*-(*- 2 )«r*+(r*. 

The various sums in (15), divided by the corresponding numbers oi 
i}.F., give unbiased estimates of a*. That obtained from the first 
sum on the right is associated with deviations of the means of arrays 
from the line of regression of y on x. On the assumption of linearity 
of regression this sum is due to sampling errors; and the estimate 
of a* obtained from it should not be significantly greater than that 
derived from the sum of squares within arrays, i.e. from 


i i 


Tabulating as usual we have: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Linear regression 

1 


Nf^Sl 

Deviation of means 
from regression line 

A-2 



Within arrays 

JV-A 



Total 

iV-1 

SS(y«-F)* 

— 


For testing whether the second mean square is significantly greater 
than the third, we have the variance ratio 


F 


rft-r* N-h 
A-2’ 


ft- 2, N-h. 


If the value of F is significant, the assumption of linearity of regres- 
sion is discredited. It should be observed that the value of F 
depends not only on the difference 77 *— r*, but also on i]‘, N and ft. 
Thus a knowledge of 7 *— r* is by itself insufficient to decide the 
question of linearity of regression. 


WMt 


15 
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Example. A raiidom sample of 200 pairs of values from a bivariate normal 
population, when grouped in 10 arrays of ^*8, gave values r = 0*3andi;„ = 0*4. 
Are these results consistent with the assumption of linearity of regression? 
Here N = 200, h = 10, = 8, p, = 190 and 


„ 0-07 190 

F = X — 

0-84 8 


1-98. 


This value is just on the 5 % level of significeuice. The assumption of linearity 
of regression is thus rather discredited. 


Analysis of Covariance 

105. Resolution of the *sum of products*. One criterion of 
classification 

Analysis of covariance has been described as ‘the technique of 
testing for homogeneity in problems dealing with two or more 
correlated variables Its main use is to test the significance of 
the difference between the mean values of a variate y in certain 
classes, when these have been corrected for differences in some 
concomitant variable x. The method of doing this will be explained 
shortly; but we must first study the necessary algebraical tools. 

The resolution of the total variation into components, already 
studied in analysis of variance, has its counterpart in analysis of 
covariance; and the algebra of the two processes runs along parallel 
lines. The total covariation of a bivariate sample, represented by 
the sum of the products of the deviations of the variates from their 
means, may be resolved into components associated with different 
factors; and from these components, and the corresponding com- 
ponents of variation of x and of y, estimates of the coefficient of 
regression (or correlation) in the population are determined. The 
e^ftimate from ‘error’ is tested for significance as in §§ 103, 89 or 90; 
and, if this proves to be significant, the other estimates are tested 
by comparison with it. We are thus able to estimate the effects of 
the various factors on the degree of association of the variates. 

Suppose that the data consist of N pairs of corresponding values 
of the two variates, x and y, and that these may be grouped in h 
different classes according to a certain criterion. With the double 

* Wishart and Sanders, 1936, 3, p. 46. 
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suffix notation, is the jth, pair of values in the ith class 

(i s 1, A). The numbers of pairs in the different classes are not 
necessarily equal. Let be the number of pairs in the ith class, so 
that As before let x, y denote the general means of the 

two variables, and y^ the means of their values in the ith class. 
Then 

S = 0 = s {ya-Vi)- (16) 

The deviations of and y^^ from their general means are ex- 
pressible as 

Vii-y iya-yd-^iVi-y)- 

On forming their product, and summing over all pairs of values, 
we have 

S S (% - x) - y ) = s s {Xii - Xi) (y^^ - ^i) + S n^ix^ - x) (y^ - y ), 

ii ii i 

(17) 

the remaining sums disappearing in virtue of (16). Corresponding 
to this we have also the resolution of the sum of squares of the x’s, 
expressed by (3), and that of the y’s given by a similar formula. 

Suppose now that the N pairs of values constitute a random 
sample from a homogeneous population, in which the covariance 
is /III. Then, by §61, the expected value of the sum in the first 
member of (17) is (iV’— l)/4ii. Similarly, since the values in the ith 
class may be regarded as a random sample of pairs, 

(«« - (y« - y<)) = K - 1 ) i»u. 

SO that 

< i i 

Consequently, since the expectations of the two members of (17) 
must be equal, that of the final sum in the equation must be 
(A— l)/^ii* Thus the various sums in (17), divided by 1, Aand 
A — 1 respectively, give unbiased estimates of the covariance of the 
population, based on numbers of d.f. represented by these divisors. 

13-1 
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Following Wishort and Sanders* we may conyeniently denote 
the time sums in (17) by C*, C and C respectively, the equation 
being then equivalent ix) O’ C'+O. The corresponding sums of 
squares for the «’s will be denoted by ^ ^ ' and A , and those for the 
y’a by B’, B' and B. Then (3) and its counterpart are equivalent to 

The resolution of sums of squares and products may be combined 
in a single table as follows: 


Source of 
variation 

H 

Sums of 
squares 

Sum 

of 

pro- 

ducts 

Coefficient 

of 

regression 

Coefficient 

of 

correlation 

Between classes 
Within classes 

RW 

A,B 

A\B' 

0 

O' 

b^CIA 

b'^C'IA' 

r = OlMB) 

Total 

N^l 

A^B'^ 

0" 

— 

— 


We thus obtain different estimates of the coefficients of regression 
and correlation. By means of §§ 103, 89 or 00 we first test whether 
the estimate obtained within classes is significant of correlation in 
the populatioh. If it proves to be significant, we proceed to test the 
differences between the class means after these have been corrected 
for regression. Incidentally we may also test the significance of the 
difference between 6 and 6'. 

106. Calculation of the sums of products 
The sum of products of the deviations of N pairs of numbers 
(e = 1 N), from their means x,yia given by 

• • B § 

(18) 


where T 2*' - Sy, » 

Applying this formula to the sums in (17) we have 

O’ - - 2:s»«y«-3’T7^, (19) 

• Wishart and Sandeia* 1936, 8, p. 48. 
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Tf being the grand totals for x and y respectively. Similarly^ on 
summing first for the values in the ith class, we have 


where 


S S "" ^i) {j/ij “ =* S (S ““ 

TJ ** S T’i = S tfij * 


these being the sums of the values of z and y in the tth class. Con- 
sequently ^ 

i i i 

Subtraction of (20) from (19) gives the remaining sum 


( 20 ) 


0 = S TfT'tlnf- TT’IN, (21) 

which may, of course, be obtained independently. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (19), (20) and (21) are 
unaltered by a change of the origins of x and y. In other words, if 
all the values Xfj are decreased (or increased) by the same constant, 
and all the values y^^ by another constant, the results obtained for 
the three sums of products are unchanged. The arithmetic may often 
be simplified in this manner, large numbers being replaced by much 
smaller ones. 


107.. Examination and elimination of the effect of regression 
The method of § 103, for testing the significance of an estimated 
regression, consists in resolving the total sum of squares of the y's 
into two components, one due to regression and the other to devia- 
tions from the regression line, and then comparing the estimates 
of population variance derived from these two sums. It will be 
observed that the test was applied to a sample without any attempt 
at classification, that is to say, quite apart from any resolution of 
the variation into components due to factors other than regression. 
In terms of the total sums of squares and products. A", B", C, 
the sum of squares due to regression is expressible as 

NSlr* « - 0 '*/^', ( 22 ) 

and that due to deviation from the regression line is B''—C''*jA''. 
The former corresponds to 1 d.v. and the latter to N —2, which is 
one less than for the total sum of squares. 
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Now we have teen how, after the values in the sample have been 
classified, we may resolve the total variation of the variables, and 
their covariation, into components between the means of classes 
and within classes respectively. From each set of components we 
may calculate an estimate of the regression coefficient and a line 
of regression; and the corresponding variation of the y’s may be 
further resolved by the method of § 103 into a part due to regression, 
and another due to deviation from the regression line, the number 
of 0.7. for the latter being one less than the total d.f. for that com- 
ponent. Applying this process to each line of the table at the end 
of § 106, wo find from each a sum of squares of deviations from the 
corresponding line of regression, with d. 7. as indicated in the 
following table:* 


Source of 


Residual sum 

variation 

B.F. 

of squares 

Between classes 


B-C*IA 

Within classes 



Total 




leading to different estimates of the variance of j/ in the population. 
Denoting the estimate within classes by we have 


* ~ N-A-1 • 


(23) 


This is the estimate of the variance of y after correction for regres- 
sion. The significance of any other estimate of the variance is tested 
by comparison with it. 

The significance of any apparent regression is first tested by the 
method of § 103, applied to the sums withiu classes; that is to say, 
we compare the estimate C'^/A' of variance, due to regression and 
based on 1 D.7., with the above estimate based on N —h— 1 n.F. 
If the regression proves to be significant we proceed to test the 
differences between class means after correction for regression. 
Subtracting the second row of the above table from the third we 
have the sum of sqiuires B+O'^jA’ — C^jA* with h— 1 D.P., which 
we also compare with the sum of squares within classes. If it proves 

* Cf. Wishart and Sandeia, 1936, 3, p. 49. 











107] One Criterion 281 

to be significant we conclude that there are difierences between the 
class means, after these have been adjusted by the regression 
coefficient V within classes. 

Now in the first line of the above table we have a sum of squares 
between class means, adjusted by the regression coefficient 6, and 
corresponding to (A— 2) d.f. The testing of this sum in place of the 
above raises the question of the significance of the difference between 
h and b\ This may be examined as follows. Tabulating the two sums 
of squares just mentioned, arid their d.p., we may arrange the work: 


Between classes 

D.F. 

Sum of squares 

Adjusted 

mKSm 

B+C'*IA'-C'*IA‘’ 

Adjusted by b 


B-C*IA 

Difference 

1 

C*IA + C'*IA'-0'*IA' 


Now the sum in the last row may be expressed 

91 + _AA\b-V) 

A'^ A' A+A' ~ A + A'\A A') ~ A + A' 

Comparing this estimate of variance, based on 1 d.f., with the 
estimate based on .AT — A — 1 , we have the variance ratio 


AA\h-b'f N-h-\ 

'' A+A' ^B'-C'^/A" 

^2 = N-h-l. 


(24) 


A rare value of F indicates that the two coefficients, 6 and b\ are 
significantly diflerent; in which case the factor of classification has 
an effect upon the degree of association of the variables. 

Example. To examine the relation between the yield of grain (se bushels/ 
acre) and the cost of production (y shillings/bushel) in a certain state, six 
districts were chosen at random, and in each a random selection of five farms 
was made. The results for the season are tabulated below, columns corre- 
sponding to districts. Is there any significant indication of correlation 
between the variables; and, if so, does its value vary with the district? 


X y X 

13 3-6 10 

11 40 9 

18 2*5 8 

14 3*7 12 

12 4*3 11 


V » y 

4*7 16 2*8 

5*3 13 3*0 

5*6 15 2*8 

4*0 10 4*2 

4*4 12 3*2 


X y X 

9 4*5 12 

12 3*6 17 

10 4*0 19 

13 3*4 11 

8 5*5 14 


y 

X 

y 

3*8 

15 

4*7 

2*2 

11 

6*1 

2*0 

9 

6*0 

4*6 

14 

4*3 

3*4 

17 

3*9 
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Shifting the ori^ to a; s 12, ^ = 3, and rewriting the table, the student 
win easily verify that 

^1 = 8, -10, 6, -8, 13, 6; T a 16, 

T;=3, 9, 1, 6, 1, 9; T' = 29, 

and hence that 

T*/JV = 7-6, !r'V2^ = 2803, TT'/JV = 14*6, S = - 8-2, 

f 

SSicJ = 269, SSt/J=: 66-96, S S = - 64-8. 
i 1 i 1 i i 

Consequently 

A = 93-8-7-6 = 86-3, = 269-93-8 = 166-2, 

259-7-6 = 261-6. 

B = 41-8-28-03 = 13-77, B' = 66-96-41-8 = 16-16, 

J5^ = 66-9-28-03 = 28-93. 

C = - 8-2 - 14-5 = - 22-7, O' = - 64-8 + 8-2 = - 46-6, 

(7^ = -64-8-14-6 = -69-3. 

The tabulated results ore: 


Source of 

m 

Sum of squares 

Sums of 

Coefficient 

of 

regression 

variation 

B 

(**) 

(»*) 

products 

Between districts 

6 

86-3 

13-77 

-22-7 

-0-263 

Within districts 

24 

165-2 

16-16 

-46-6 

-0-282 

Total 

29 

261-5 

28-93 

-69-3 

-0-276 


First, to test the significance of 6' we have, as explained above. 


F = 


C'^IA' 13-14x1 

-x(^-A-l) = 


B^-C'^IA' ' ' 2-02 

Vi = 1, = 23. 


= 149** 


Thus the value of h' is highly significant of negative correlation. 

To test the differenoes of class means, after these have been corrected for 
regression, we calculate the estimate of variance 

B + 13-77 + 13-14- 19-09 , 

hZl “ 5 


and compare this with the estimate within classes 

s« = 2-02/23 = 0-088. 
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The variamoe latio is 

F = 1-66/0-088 = 17-7** 

which, for Vj = 6 and v* = 23, is highly ngnifioaat. We concltide that the 
cost of produotioa, oonected for differences in yield, varies significantly from 
district to district. 

By using (24) show that b and 6' do not differ significantly. 

108. Two criteria of classification 

Suppose next that the N pairs of values in the sample 

may be classified according to two different criteria, ff and K, the 
former determining A different classes and the latter A different 
groups. Suppose for simplicity that one pair of values from each 
group is present in each class, and vice versa, so that N = hk. The 
pairs of values may be arranged in a rectangular array of h columns 
and k rows, in which columns correspond to classification H and 
rows to K. The pair y^y) belongs to the tth class and the jth 
group, and appears in the »th column and the ^'th row. 

The resolution of the sum of products expressed in (17) is still 
valid. But the first stun in the second member of the equation may 
be further resolved. Thus if 

~ 

and we apply the resolution expressed by (17) to the sum of pro- 
ducts S S making it however with respect to the means of 

i J 

rows instead of the means of columns, we find by the same argument 
as in § 100 that 

SS (»«-«) + 

i i i i 

+ S S (% -Xf-Xf+x) {ifti -Vi-tfi+y). ( 26 ) 

i i 

Denoting the various sums in this equation by C, C^, Cg, O' we may 
write it briefly 

C'-Ci+Cg+C'. 

On the assumption that the pairs of values constitute a random 
sample fixim a homogeneous population, the expected value of the 
first mwnber of (26) \a(N- l);(tu. ^l^ere /tu is the covariance in tiie 
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population. That of the first sum in the second member has been 
shown to be (1^— f)Ad similarly that of the second sum is 
Consequently the expectation of the final sum is 

On dividing the various sums by 1), (A— 1), (A— 1) and 
(A — 1) (A — 1) respectively, we thus obtain four unbiased estimates 
of the covariance of the homogeneous population, with D.r. repre- 
sented by these divisors. In § 100 we obtained the corresponding 
estimates of the variance of x, and similar equations give estimates 
of the variance of y in the population. From the partial sums we 
obtain three independent estimates of the coefficients of regression 
and correlation. Denoting the sums of squares of the »’s by A", Ai, 
Ag, A' and those of the y’s by B”, B^, B', we may tabulate 

the results: 


Source of 
variation 

D.F, 

Sums of 
squares 

Sum of 
pro- 
ducts 

Coefficient of 
regression 

(*•) 

(»*) 

Between classes 

^-1 



Ki 

bi = C,fA^ 

Between groups 

Aj-1 



Bl 


Error 


A' 

B' 


h’ = C’IA’ 

Total 


A’' 


Ei 

— 


The first step is to test the significance of the regression coefficient 
b' obtained from the error sums; and this is done exactly as in § 107, 
except that the number of d.f. for error is here (A — 1) (A — 1). Thus 
the residual sum of squares due to error is B' — C'^/A', with N—h—k 
D.V., giving the estimate of variance of y 

^ = (B'- C"»IA')I{N - A - A). 

With this we compare the estimate C'^jA ' obtained from the regres- 
sion sum of squares corresponding to 1 d.f., and the result settles 
whether 6' is significant. K it is, we proceed to test the significance of 
the differences of the means of y in classes (or groups) after these have 
been corrected for regression on x. This may be done as follows. Take 
the first and third rows of the above table, and by addition form a 
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combined source of variation, S = (classes + error), with sums of 
squares and products A, B,C. Thus we have: 


Source of 
variation 

D.F. 

Sums of squares 
and products 

Classes 

Error 

A-1 

A\ C' 

Total, S 

N-k 

A, B, 0 


where A = Ai+A', etc. From these we form the corresponding 
residual sums of squares as in § 107, viz. 


Source of 
variation 

D.F. 

Residua] sum 
of squares 

Classes 

h-2 

Bi-CiMi 

Error 

1 

1 

B’-C'*IA' 

S 


B-C*IA 


Subtracting the second row from the third we obtain a residual 
sum of squares B-^ + C'^jA' — C^jA corresponding to A — 1 d.f. 
Comparing the estimate of variance obtained from this with the 
estimate s* we are able to decide whether the class means differ 
significantly after correction for regression. 

Incidentally, we may deduce a test for the significance of the 
difference b^—V. For, on subtracting the residual sums of squares 
for classes and error from that for 8, we have the sum 


CyA^+C'^IA'-C^IA 

with 1 D.F. As in § 107 this sum is expressible as 
A,A'{b,-b’mA, + Ay, 

so that in testing it we are testing the difference b^ — 6 '. Comparing 
it with the residual sum of squares for error we have a variance ratio 


A^A\b,-by N-h-k 

J! - I— n X 


(26) 


Ai+A' 

= 1 , 

We thus determine whether is significantly different from 6'; 
and we may test 62 by a similar formula.* 

* For a numerical illustration see £z. 16» below* 
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EXAMPLES XI 

1. Verify as follows that the ezpeoted value of the final sum in 

§ 97 (3) is (A — If /i is the mean of the population, 

i { 

Since E{Xf — fi)* is the variance of the mean of a sample of members, 
and E{x — ft)* is that of the mean of a sample of N members, it follows 
that —1 

•®| 2 ®)*J = '^n^{cr*ln^)—N(r*/N = (A— l)(r*. 

Verify the mean value of S S (*«—*<)* similarly. 

t f 

2. With the notation of §100, Tff = Xff—Xf—Xj+x, show that 

SZy = 0 = Thus the sum of the F’s in any row or in any 
< i 

column is zero. Only (A— 1) of the quantities are in- 
dependent. 

3. To test the significance of the variation of the retail price of a 
certain commodity among fom- large cities. A, B, 0 and D, seven 
shops were chosen at random in each city, and the prices observed 
were as follows: 

Ai 6s, lOd., 6a. 7d., 6a. Id., Sa. 9d., 5a. 6d., 6a. 3d., 6a. Id. 

B, 7a., 6a. lOd., 6a. 8d>, 6a. 7d., 6a. 4d., 6a. 8d., 6a. 2d, 

C: 7a. 4d., 7a., 6a. 8d., 6a. 8d., Bs. 8d., 6a. 6d., 6a. 6d. 

D: 6a. 7d., 6a. 6d., 6a. 4d., 6a. 2d., 6a., 6a. 8d., 6a. 4d. 

Do the data indicate that the prices for the four cities are signi- 
ficantly diffeientt 
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The classes correspond to the cities. Expressing the prices in 
pence show that the tabulated results are : 


Source of variation 

D.7. 

Sum of squares 

Mean square 

Between cities 

3 

94-96 

31*65 

Within cities 

24 

1446-00 

60-25 

Total 

27 

1540-96 

— 


Thus the mean square between cities is less than that within cities, 
and is therefore not significantly large. Neither is it significantly 
small because, for =: 24 and s 3 , a value of f in the neighbour- 
hood of 2 is not significant. 

Show that the s.b. of the difference of the means for two cities is 
4* 16d. ; and hence for no two cities is the difference of the mean prices 


4. Show that, if Xi and are the means of two classes of and 

members respectively, and is the estimate of population variance 

derived from error and corresponding to v n.F., then a^iljnx H- l/»i) 

is the S.B. of the difference of the means of the two classes, and 

hence that the statistic . 

t . v^i— *2/ 


isdistributedliketfori'D.B. Thisformulaprovidesamethodof testing 
the significance of the difference of the means of two unequal classes. 


5. In a certain large country, to test the variation in the price 
of a certain commodity with district and season, six districts were 
chosen at random, and the price was observed in each on six random 
occasions throughoutthe year. Theobservatlons are recordedin pence 
in the table below, in which columns correspond to seasons and rows 
todistriots. Discuss the significance of the variation with season and 


with district. 


11 15 24 19 21 17 

12 16 26 18 22 16 

10 13 21 17 20 16 

9 14 18 18 21 13 

13 16 20 16 23 14 

14 17 23 20 22 16 
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Show that analysis of Tariance leads to the following results: 


Source of 
variation 

■1 

Sum of 
squares 

Mean square 

F 

Seasons 

5 

492-33 

98-46 

«6-6*» 

Districts 

5 

44-33 

8-86 

50** 

Error 

25 

44-33 

1-77 

— 

Total 

35 

581-00 

— 

— 


Both the variance ratios are highly significant. Show that the s.B. 
of the difference of the means for two seasons or two districts is 
0*77 ; and deduce that the prices in the first, second and last districts 
do not differ significantly. 

6. An agricultural experiment, on the Latin square plan, gave 
the following results for the yield of wheat per acre, letters corre- 
sponding to varieties, columns to treatments and rows to blocks. 
Discuss the variation of yield with each of the factors: 


A 

16 

B 

10 

G 

11 

D 

9 

E 

9 

E 

1.0 

C 

9 

A 

14 

B 

12 

D 

11 

B 

16 

D 

8 

E 

8 

C 

10 

A 

18 

D 

12 

E 

6 

B 

13 

A 

13 

C 

12 

C 

13 

A 

11 

D 

10 

E 

7 

B 

14 


Show that the results obtained by analysis of variance are: 


Source of 
variation 

D.F. 

Sum of 
squares 

Mean square 

F 

Treatments 

4 

66-56 

16-64 

87-8** 

Blocks 

4 

2-16 

0-54 

1-2 

Vaiieties 

4 

122-56 

30-64 

69-6** 

Error 

12 

6-28 

0-44 

— 

Total 

24 

196-56 

— 

— 


Show also that the s.B. of the difference of the means of two 
classes is 0*42; and hence that for no two blocks do the means differ 
significantly. 
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7. A random sample of 100 pairs of values from a normal popula- 
tion, grouped in 10 arrays of y’s, gave a correlation ratio ofyaax 
equal to 0*3. Show that this value is not significant of association 
between the variables. 

8. A random sample of 150 pairs from a bivariate normal popula- 
tion when grouped in 15 arrays of y’s gave values r = 0*4 and 
r/y = 0*5. Show that these results are consistent with the assump- 
tion of linearity of regression of y on x. 

9. A parabola, fitted to a random sample of 45 pairs of values 
from a normal population, gave an index of correlation = 0*3. 
Show that this value is not significant of parabolic regression. Also 
show that it would not be significant for a sample of less than 
83 pairs. 

10. Give a direct proof that the expected value of the last sum 
in§105(17)i8(A-l)/tu. 

1 1 . Give a direct proof of the formula § 106 (2 1 ) for <7. 

12. In the example of § 107, show that the sum of squares repre- 
sented by B — C^jA, which corresponds to 4 d.f., is liighly significant. 
Hence the deviations of the means of districts from the line of 
regression fitted to them are too great to be attributed to chance. 

13. Supply the details of the proof of § 108 (25). 

14. With the notation of § 107 the difference of the means of the 
y's for the pth and gth classes, when corrected for regression, is 

Vp-VQ-b'iXp-Xy) (i) 

which consists of two independent parts. The estimated variance of 
the first part is 28^/k, where k is the number Uf in each class, and 
s* is the error mean square (B' — C'^JA')I{N —h— 1) in the analysis 
of residual variance. The estimated variance of 6' is s^JA'. Hence 
show that the estimated variance of the difference (i) is 

8^[2lk+{Xy-Xym']. (ii) 

The quotient of the difference (i) by the square root of the expression 
(ii) is distributed like t with N — h—1 d.f. (Cf. Wishart, 1936, 2; 
also Wishart and Sanders, 1936, 3, pp. 53-4.) 
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15. Examine,’ as in § 108, the oovaiiation between the yields of 
grain and straw, as indicated by the following data.* Each of 6 
blocks was divided into 5 plots, and 6 different treatments {A, B , ..., 
E) were distributed at random among the plots of each block. The 
columns correspond to blocks, and the yields of grain and straw are 
denoted by z and y respectively: 



« 

» 

0 

» 

0 

V 

0 

y 

0 

V 

A: 

65 

32 

68 

26 

66 

32 

71 

26 

62 

33 

B: 

76 

38 

64 

20 

71 

32 

71 

28 

64 

29 

C: 

72 

33 

69 

30 

69 

38 

69 

30 

61 

30 

D: 

70 

29 

69 

27 

72 

26 

70 

24 

70 

31 

E: 

70 

37 

67 

29 

62 

33 

66 

23 

66 

40 


The factors of classification are blocks and treatments. Diminish- 
ing all the x’b by 66 and all the y’s by 26, verify that 

3;= 22, -3, -1, 17, -7; T = 28 

T't = 39, 2, 31, 1, 33; T' = IOC, 

and calculate the values of 2} and T'^ Hence show that the sums of 
squares and products are given by: 


Source of 
variation 

D.V. 

Sum of squares 

Sum of 
products 

Coefficient 
of regression 

(«^) 

(j/*) 

Blocks 

n 

13604 




Treatments 

■■ 

98-24 



f 

Error 

16 

465-36 

IQQI 

HI9 


Total 

24 

688-64 

664-66 

113-28 



The estimate obtained from the residual sum of squares for 

144-34 „„„ 

«* =» -r r—T- r - « — vr— = 9 - 62 . 


To test the significance of 6' we compare with s* tlie estimate 
C'^/A* = 67-1 corresponding to 1 d.f. The variance ratio is 

iP a 67-1/9-62 = 6-9 (Vj = 1, v, » 16). 

* Data adapted from Fisher and Eden, Jotim. Agric, Scu^ voL 17 (1927), 
p. 548* 
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Thus the value of b' is decidedly significant of positive correlation 
between x and y, apart from the factors of classification. To test 
whether the yield of straw, corrected for yield of grain, varies 
significantly with the treatment, we calculate 

C" = Cg+ (7* = 110‘6, A. — = 653'6 

and compare the estimate of variance (3^ + G'*/A' — C*/A)l(k— 1) 
based on 4 D.7. with a* based on 16. The former has the value 

• (87-36 + 67-1- 22- 1 )/4 = 33*09, 

and the latter is 9-62. The variance ratio is 

F = 33-09/9-62 = 3-44 

which, for =< 4 and = 16, is significant at the 6 % level. We 
conclude that the yield of straw, after correction for yield of grain, 
does vary significantly with the treatment. 

Show that the difference b' — 62, tested by § 108 (26), is decidedly 
significant. 


WKt 


t6 



CHAPTER Xn 


MULTIVARIATE DISTRIBUTIONS. 
PARTIAL AND MULTIPLE CORRELATIONS 

109. Introductory. Yule's notation 
In considering a bivariate distribution we saw that, when it is 
known that the values of one variable are influenced by those of 
another, the coefficient of correlation provides a useful measure of 
the degree of association between them. But it often happens that 
the values of a variable are influenced by those of several others. 
It is known, for instance, that the statures of men are influenced 
by those of their ancestors; and the yield of grain is affected by the 
amounts of different fertilizers used. In such cases our data usually 
constitute a distribution of values of several variables. If we are 
concerned with the combined influence of a group of variables upon 
a variable not included in that group, our study is that of multiple 
regression and mvUiple correlation. If, however, we wish to examine 
the effect of one variable upon a second, after eliminating the effects 
of other variables, our problem is that of partial correHation. The 
analysis involved is rendered simple and compact by a notation due 
to Yule,* which has gained a fairly wide acceptance in recent years. 
Before applying this to the more general case of several variables, 
we shall pave the way for the student by reviewing the results 
obtained for two variables in Chapter rv, and expressing them in 
Yule’s notation. 

Let the two variables, measured from their means, be and x^, 
with standard deviations cr-i and (Tj. If the lines of regression of 
Xj on Xf, and of x, on Xi, are 

*1 — *2 ^ ^ 21 ® 1 > (^) 

respectively, the residuals, or errors of estimate of ihe vwiables 
incurred by using these equations, are expressed by 

*1.2 = *1 - *2.1 = *2 - 621*1- (2) 


* Yule, 1907, 1. 
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These residuals are the deviations of the representative points from 
the corresponding line of regression. The values of the coeflScients 
of regression, 612 ^21* ^^^e obtained by minimizing the sums of 

squares of the residuals; and this is done by equating to zero the 
partial derivatives of these sums with respect to and 6211 and the 
constant terms. This leads to the normal equations 

2 (3?!'" 612^2) “ S ^2(^1 ““^12^2) “ 9 ( 3 ) 

in the one case, and 

S (^2 ““^21^1) “ 9 , S^i(^2”^2i®i) “9 ( 3 ') 

in the other. The first equation in each case expresses the fact that 
the mean of the distribution is on the line of regression; and, with 
the mean as origin, the equations of these lines have the simple 
form (1). The above normal equations, expressed in the more 
compact notation, are 

S® 1 . 2 ~ 9 , S^ 2 ® 1 . 2'=9 ( 3 ) 

and Si*^2.i = 9 , ~ 9 , ( 3 ') 

respectively, the summation including all pairs of values of the 
distribution. It will be observed that, whereas in Chapter iv 
subsedpts were used to distinguish the pairs of values in the distribu- 
tion, they are here used to distinguish the different variables. They 
are no longer needed for the former purpose, since S will throughout 
denote summation over the whole distribution. 

From (3) and (3') we have the familiar values of the coefficients 
of regression, 

^12 “ S ^1^2/ S ^21 == 2 ^2/ 2 

The coefficient of correlation, which we denote by or r2i, is 
such that 

^12 = ^12^21» (^) 

and 612 “ ^i2^i/^2> ^21 “ ^2i^2/^i> (^) 

ri2 having the same sign as 612 ^21* which is the sign of S ^2- 

should be remembered that, although ri2 = ^2i> values of 
and 621 ^ ^ general different. Lastly the mean squares of the 
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daviatioiu (2) are denoted by (rf,| uid respeotiTely. In virtue 
of § 26 (21) and (23), these are given by 

2^ S*i.* “ ®‘f(l 

1 

o-!.i = ^S«li = 0-1(1 -rii). 

The quantities , and <r,, ^ are referred to as standard deviations of 
the first order, the order being the number of subscripts following the 
point. These are called secondary subscripts, vrbiie primary stdmripts 
are those preceding the point. The standarddeviations(riand<r 2 are of 
zero order. The residuals Xj, , and ^ are deviations of the first order. 

The regression of on is said to be linear, if the mean of each 
array of Xj’s is on the line of regression. 


110. Distribution of three or more variables 
Consider next a distribution of three variables which, measured 
from their means, are x^, and x^. The data consist of N sets of 
corresponding values of the three variables. We enquire first as to 
the best estimate of x^ that can be obtained from the data in the 
form of a linear function of x^ and a;,. The regression equation is 
thus of the form , . , . 

Xi — ® ^ 12 . 8^2 ^ 18 , 2 ^ 8 ’ 


The ‘best* estimate is interpreted in accordance with the principle 
of least squares, the constants being chosen so as to make the sum 
of the squares of the errors of estimate a minimum. The subscripts 
to the coefficients are written down on the following principle. The 
first is that of the variable for which the estimate is being found, 
and the second is that of the variable whi<ffi the coefficient multiplies. 
Hiese are primary subscripts. Separated from them by a point are 
the subscripts of the other variables that enter into the equation. 
These are secondary subscripts,, and their number determines the 
order of the regression coefficient. In the above equation the 
coefficients are partial regression coefficients of the first order. The 
dgnificance of the term ‘partial’ will be explained shortly. The 
constant a has been given no subscripts, because it will appear 
immediately that its value is zero. 
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The sum of the squares of the residuals to be minimized is 

2 — a — 6^ 3^1 — 6 i3.2J*?8)*» 

Equating to zero its partial derivatives with respect to a and the 
i’s we have the normal equations 

2j ^12.8 ‘“^13.2^3) 

2 ^2(^1“"® ““^12.8^2 “”^13.2^3) “ 

S ^3(^1““® “"^ 12 . 3^2 “^13.2^3) == 0 * 

Since the mean of each variable is zero, the first of these equations 
gives a s 0; and the regression equation of on X2 and x^ is simply 

Xi =a 612.3^8 ^18.2^8* (^) 

The other two normal equations may then be written more con- 

^ 2^2^1.23 “ ^ 2 ^3^1, 28» (^) 

in which x^ ^z* defined by 

^1.23 * ^l““^12,8®2“^18.2^8» (^) 

is the residual, or error of estimate of x^ from the regression equation 

(7). Its mean is zero, since those of the other variables are zero. 
Hence the s.d. of this residual, denoted by o’i 2z> is given by 

^^S.23 = 2i*^!.23, (10) 

the summation covering the whole distribution, whose total 

frequency is N, This is a s.d. of order two, since it has two secondary 
subscripts. 

Similarly, we have the regression equation of X2 on x^ and ^3, 

*, = 6*i.3*i+6„.i!r„ (11) 

for which the error of estimate is 

*2.18 ~ ®2~^81.8®1‘~^83.1®8» (12) 

and the normal equations 

2*1*8.18 “ ® “ S*8®8.18* (^®) 

There are also wimilar equations for the regression of Xg on Xi and *,. 
In general the coefficients iu,, a>nd bgi^g are different. 
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The normal equations (3), (8) and (13) express that the sum of the 
products of corresponding values of a variable and a residual is zero, 
when the subscript of the variable is included among the secondary 
subscripts of the residual, the summation covering the whole dis- 
tribution. Further, if is the residual defined by (2), we have 

S®1.28®1.2 ~ S®1.23(^1"“^12^2) ~ S^*^1.23^1* 

Similarly, 

S^l.23^1.23” S®1.23(^1*“ ^12.3^2 "“^ 18 . 2 ^ 3 ) “ S^l.23^1 

in virtue of (8). Thus it is evident that the sum of the products of two 
residuals is unaltered by omitting from one of the factors any secondary 
subscripts which are common to both. In virtue of this result and the 
normal equations it follows that the sum of the products of two residuals 
is zero if all the subscripts of the one are included among the secondary 
subscripts of the other. 

The argument of this section is also applicable to the case of n 
variables x^, x^. In the regression equation of Xy^ on a^a, . . 

which corresponds to (7), the coefficients have each n — 2 secondary 
subscripts, and are therefore of that order. The residual »i.a 3 ...n 
corresponding to (9) is of order n— 1, as is also its s.d. o'! 23 ...n- 
The normal equations are 

S^i^l. 23 ...n = 0 (^ = 2, 3, ...,n). 

And the theorem concerning the omission of common secondary 
subscripts is equally valid in the case of n variables. 

111. Determination of the coefficients of regression 
The regression equation (7) may be interpreted as representing a 
plane, called the plane of regression of Xy^ on x^ and x^. The normal 
equations (8) may be expressed 

^^1^2^12"" ^12.8^1 "“^13.2^8^2^82 “ 

C^1^8^18"“ ^12.8^2^3^28 "^^13.2^8 ~ 

where r ^2 ^ correlation between and x^, obtained by ignoring 

the values of x,. This is called the total correlation of Xy^ and x^. 
Similarly, r^ is the total correlation of x^ and x^, and so on. These 
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coefficients are symmetrical in the subscripts. We may eliminate 
the 6*s between (7) and (14) by equating to zero the determinant of 
their coefficients and the remaining terms. Dividing the second 
and third rows of this determinant by 0*2 and o’8 respectively* and 
then the first, second and third columns by cr^ and respec- 
tively, we obtain the regression equation of on the other variables 
in the form 

Xi a?2 x^ 

(Ti 0*2 ^3 

^12 ^ ^32 “ 

^13 ^23 1 

If then w is used to denote the determinant 

1 ^21 ^31 

6) = ri2 1 ^32 (16) 

^18 ^28 1 

and is the cofactor of the element in the tth column and the 
jth. row, we may write (15) as 

^u|-^+^12^* + ^13~* = 0. (17) 

(Ti cr2 0^3 

From this form of the regression equation it follows that 


- <^1^12 h — ^1^18 

®i2.8 — ” "77 Tir* 

^2^11 ^3^11 


The regression of on x^ and x^ is said to be linear if the mean of 
each array of x^^ hes on the plane of regression (17). 

The residual 23 mfl'Y be looked upon as the deviation of the 
representative point from the plane of regression. The term deviation 
is therefore often used in place of residual. The variance <r}^23 of 
a?! 28 be expressed in terms of w, Wn and (X^, For 


-^^1.23 “ S^l.28 == — 612.3^2 “^18.8^8)- 


This is equivalent to 


((rf—erf, 23) *“^12.8^1^2^21 "■^13.2^1^3^81 — 
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and, on eliminating the b’a between thie equation and (14), we have 
a tesult which may be expressed 



or = 0. 

Consequently <rj. 2 a = oxrf/wu. (19) 

The above argument clearly holds for the more general case of 
n variables. In place of (16) we then have a determinant &> of order 
n; and the regression equation of Xi on is 

<ri <r2 

From this we obtain, as in (18), the regression coefficients of order 
n—,2. In place of (19) we have an equation giving the variance of 
the deviation * 1 , 28 . ..fn namely, 

o’i.£8...» = <"o-i/Wu- (21) 

Example, Let Xi^ in. be the excesses of the heights of father, mother 
and son respectively above their mean values. A distribution of these variables 
gave the following approximate correlations and standard deviations:* 

r22 “ 0*28, 1*22 — 0*49, r2i — 0*61, 

0*1 = 2*7, 0*2 = 2*4, 0*2 = 2*7. 

Show that the regression equation of on and x% is 

a ?2 = 0*40a?i + 0*42a?2, 

and deduce that, if the mean heights of father, mother and son are 67*68, 
62*48 and 68*65 in. respectively, the regression equation for actual heights 

^ i » ■^8 ^ 

X, = 16*3 + 0*40Xi + 0*42Xj. 

Also show that cr 2 .i 2 == 

* Modified data from Pearson and Lee» Biometrihaf vol. 2 (1903)* pp. 
357*462. 
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112. Multiple correlation 

We proceed to find the correlation between the variable and 
its estimate from the regression equation (7). This correlation, 
which is an indication of the agreement between and its estimate, 
is called the coefficient of multiple correlation between and the two 
variables x^ and a;,, and is denoted* by It may be determined 
as follows. If ei,28 is the estimate of x^ given by (7), we have 

*i.*8 “ + 

or *1,2* *= *1“ *1.28- (22) 


The mean value of is zero, since those of x^ and Xi^f^ are both 
zero. The sum of the products of Xi and ^ is 


2*1*1.28 “ 2*i(*1“*1.88) “ 2*1~2®1,8* 


Also 2*1.28 ~ 2(*1~*1.83)* ~ 2*l~2*l.a8 

Consequently the coefficient of correlation between x^ and is 
given by 


■®1(8S) 


*^l~*^j. 23 

0‘lV(0’l-0'l.83) 




<rj jj). 


This is the required coefficient of multiple correlation. The result 
may be expressed 

Ol.*S ~ ~ ■®l(28))> (23) 

which is analogous to (6) and to § 41 (46). Comparing (23) and (19) 
we see that 


whence 


1 — .fif(23) — 

^(23) = 1-wMl “ l-W(l-»‘l8) 

• 2 >‘i 2>'23^81 


rja+rf; 


13 


l-r| 


28 


(24) 


(26) 


which expresses the multiple correlation in terms of the total 
correlations between the pairs of variables. 


* Some writers prefer the notation J7i.it* 
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We may note’ that Rifgg) is never negative. For, firom the above 
argument, = Z^i.as> which cannot be negative. Further, 

when RjfxA — 1 it follows from (23) that <rf „ = 0, which requires 
all the deviations to be zero, so that is given accurately by 
the regression equation. In this case is a linear function of x^ 
and X,. 

The argument is also valid for a distribution of n variables. The 
multiple correlation is the correlation between x^ and its 

estimate 28 ...n from the regression equation (20); and the argu- 
ment leads to the corresponding formula 

= -■»!(«...«)) (23') 

and 1 — ... ft) = (24 ) 

If the multiple con'clatiou is equal to unity, x^ is a linear function 
of Xg,...,Xft. 

Example 1. Prove the following relations: 

= 0 , 

.. ^ ^12.® ^ ^1^8 “t* ^13,2 S 

•®i<22) ~ (^12 . 8 Xj Xj + . I S Xj Xj)/ S Xj. 

Example 2. Show that, for the example in the preceding section, 

■^802) ~ 


113. Partial correlation 

Consider next the correlation between the deviations x^.i and 
Xg.s- Since x^., is the deviation of x^ from its estimate in terms of 
Xg, we may regard it as that part of the variable Xy which remains 
after the influence of Xj has been eliminated, as far as can be done by 
a linear equation. A similar interpretation can be given to X 2 .s. 
Hence the correlation between these deviations may be looked upon 
as the correlation between x^ and Xg after the influence of Xg has been 
eliminated. We denote this correlation by r^.g, and call it the partial 
condaition between x^ and Xg in the trivariate distribution. Having 
one secondary subscript r^.g is a partial correlation of the flrst 
order. It will be remembered that, in calculating the total correla- 
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tion ^12* values of are simply ignored. It is therefore a correla- 
tion calculated on the assumption that the variables Xi and X2 are 
influenced only by each other, and not by any other variable. 

To find the partial correlation ^12.3 we observe that, by the 
theorems of § 110, 


® S ^2.8*1.28 “ 2^2.8{^l'“^12.3®2“^13.2®a) 

— S ^1^2.8 ““^12.8 S ^2^2.3 ~ S ^1.3^2.8" ^12.8 S ^2.3» 


SO that 


^12.3 “ 2^1.8^! 


2.8i 


l/ 2 ^1.8* 


(26) 


From this result it follows that 612.3 1® 1^1^® coefficient of regression 
of 0:1.8 ^2.8- Similarly, 621.3 is the coefficient of regression of 0:2.8 

on 0:1.3; coefficient of correlation between these deviations 

i. before given by ,2,, 


Since cTi.j and ^-^e the standard deviations of 0:1.8 and 0:2.8* 
coefficient of partial correlation is connected with the coefficient of 
partial regression by the usual formulae 


^12.3 ^12.3^1. 3/^2. 3 * ^21.8 — ^ai. 3 ^ 2 . 3 /^ 1 . 8 ' ( 2 ®) 

And it is now clear why the above 6’s are called coefficients of partial 
regression. The correlation ri2.3 has the same sign as 612.3, which is 
the sign of — (W12 by (18). Also, in virtue of (27), ri2.8 is symmetrical 
in the primary subscripts. If the values of the 6’s are substituted 
from (18) in (27) we obtain 


^12*^21 _ ^12 

^11^22 ^11^22 


so that 


_ "^12 ^12 ""^13 ^28 

V(^11^22) V{{^ “^13) (1 ■“^23)} 


(29) 


Similar formulae may be written down for fig 2 and ^23.1- The partial 
correlations of the first order are thus expressible in terms of the 
total correlation coefficients. 

The partial correlation between a:i and X2 in the trivariate dis- 
tribution is sometimes defined as the correlation between Xi and x^ 
for a constant value of a:8. In general, however, this correlation will 
depend upon the constant value of selected. In certain special 
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cases the second definition agrees 'vith the first, and the result is 
then the same for all constant values of x^. Necessary and sufficient 
conditions* for this agreement may be stated: 

(а) In the bivariate distribution of Xi and x^ (x^ being ignored), 
the regression of Xi on x, must be linear, and the standard de- 
viations of all the Xj arrays (x, » const.) 'must be equal. 

( б ) In the trivariate distribution the regression of x^ on X| and 
Xf must be linear, and the standard deviations of all the Xj arrays 
(Xf >= const., X 3 a const.) must be equal. 

These conditions are satisfied in the normal trivariate distribution 
to be considered in § 116. 

For a distribution of n variables there are partial correlations of 
all orders firom 1 to n— 2 . Thus if (i) denotes a definite group of 
secondary subscripts not including i and the correlation between 
the deviations Xf (i^) and denoted by is a partial correla- 
tion of order equal to the number m of subscripts in (k). It is con- 
nected with the regression coefficients for these deviations, and their 
standard deviations and er^,(a» by formulae analogous to (27 ) and 

(30) 

and (®^) 

If {k) includes all the subscripts but i aadj, we find as in (29) 


the symbols on the right being cofactors in the determinant &> of 
order n. 

A partial correlation of order m-i -1 is expresdble in terms of 
those of order m by an equation of the same form as (29), witii a set 
(k) of secondary subscripts added to each coefficient in the formula. 
Thus 


Ui.m 


V {(1 -»■?*.(«) (1 


where h, i,j are unequal. 


Example. Prove that, for the example in { 111, 


* For a proof see Camp, 1984, 1 , 19 . 341-8. 
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114. Reduction formula for the order of a standard deviation 
A 8.D. of any order may be erpressed in terms of a s.D. and a 
correlation coefficient of lower order. Thus, since 

S®l.a8 “ 

S®i.8~^U.8S*1.8*8.a> 

we have, on dividing by N and using (26) with subscripts inter- 

^ * ®1.88 “ ~^18.a^31.8) ^1.8(1 ~*‘l3,8)» (33) 

which is of the same form as (6). Since 0^,13 is symmetrical in the 
secondary subscripts, the subscripts 2 and 3 may be interchanged 
in the second member of (33). Substituting the value of (7'f,3 given 
by (6) we may also write the result 

<88*<^f(l -»•!.) (l-rka). (34) 

Similar formulae may be written down by cyclic permutation of the 
subscripts. The equations (33) and (34) show how a s.d. may be 
expressed in terms of one of lower order, and one or more of the 
correlation coefficients. Also from (33) it follows that 
The estimate of Xi firom and a;, is thus in general better than the 
estimate from 3:3 alone, being just as good only if fu,, is zero. 
Fiuther, on comparing (34) with (23) we see that 

l--B!(88)*(l-»-!a)(l-»-!».8). (35) 

which is in agreement with the values already found for Rjfggi and 
ru,3 in terms of the total correlations. From (35) it is dear that 

1 — -£ 1 ( 38 ) ^ 1 ^ 13 , 

SO that '8 x( 23) ^ 

Since is symmetrical in the subscripts 2 and 3, it follows that 
this coefficient of multiple correlation is not numerically less than 
either or r^. If then Rxitu is zero, both and must be zero, 
and Xi is then uncorrelated with either or 
The same argument shows that, in the case of n variables, the 
equation (33) has the generalization 


( 86 ) 
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where {k) denotea'as usual a group of secondary subscripts. Repeated 
application of this formula leads to a generalization of (34), namely, 

“ ®’i(l ~^i3) ~*‘!3.2)(f "*’14.33) (l~*'l».33...»-l)* (3'^) 

Comparing this with (23') we see that 

^ “ (1 — rfj) (1 ~*'i3_2) ••• (1 “*'?». 33. ..n-l)" (38) 

If jRi(23,.,n) = 0 each of the correlations in the second member is 
zero, and also each of the coefficients r^, Thus is then 

uncorrelated with any of the other variables. 


115. Reduction formula for the order of a regression coefficient 
To express a coefficient of regression in terms of coefficients of 
lower order we may proceed as follows. First 

S*i.8*s.3 = S»i(a;8-&33®3)- 


Then, since 
virtue of (26), 


— b33<ri/o'3> we may write the above equation, in 

**"1.3^13.3 ~ ^12®”! — ^18®’! • 


or, by means of (6), 

tr|(l — r|3)6i3 , = 5i3&32)* 


Thus we have the required reduction formula 

A _bl2~ ^13^33 

‘'X8.3 = YZjr~h~ * 

1 — 038‘'88 


(39) 


This may be expressed in terms of correlations. For, in virtue of 
(28), it is equivalent to 



(40) 


Substituting the values of 0*1,3 <*‘1.3 gi^en by equations of the 

form (6), we find (29) again. 

In the case of n variables, by similar reasoning, we arrive at the 
generalization of (39), 


^M.sW 


^I8.(fc) ~ ^13.0a^38.0li) 


1 - 6 , 


S3.0U^38.W 


where {k) has the usual significance. 
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116. Normal distribution 


A generalization of the bivariate normal distribution to the case 
of three variables may be obtained as follows.* First suppose that 
the variables and x^ are normally distributed about zero means 
with correlation Then the probability that a pair of values chosen 
at random will fall in the interval dx^dx^ is 




dx^dx^ 


exp 





(TtcrJj 


where « i— rig. Next assume that the regression of x^ on a;, 
and Xg is linear, and that in each x^ array the variable is normally 
distributed with s.i>. which is the same for each of these arrays. 
Then, since the mean of each array is on the plane of regression (7), 
the s.D. of each array is 0 * 1 . gg, given by 


o’l.as = ol(o/<Ou. 


(41) 


Consequently the probability that Xi, chosen at random in an 
assigned array, will fall in the interval dxi is 


dPg = 


dxi 


' 1.23 


V(2»r) 


exp(-J*J.g3/crf.gg) 


j(Oi,dxi r 1 / *1 *a ® 3 \n 


by (41) and (17). On forming the product dP^dP^ we have the 
probability that a set of values of the three variables, chosen at 
random, will fall in the interval dx^dx^dx^, as 


dx,dxodxa , , 

= /f/o ta - i ®^P ( - 

0’i0-20-gV{(2ff)**>} 


(42) 


where, after a little reduction, it will be found that 

. 1 / *? *1 *1 „ XeXg - X.X, - rtXa\ 


(43) 


The symmetry of (42) and (43) in the subscripts shows that the 
properties of the three variables are similar. Thus all the regressions 


• Cf. Rietz, 1927, 2, pp. 106-7. 
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ace linear. The variance of each array of in the trivariate 
distribution is OMrl/ctfj, s o\^„, and that of each luray of a^’s is 
wjflotgf = Os.iv 

We may verify the statement made in § 113 that, in the present 
case, the correlation between Xi and for a constant value of z, 
is the partial correlation ru,,. For ^ may be expressed in the form 

+^*(»i--6i8»8)(»a-*28*s)]+^. (44) 

and, on comparing (42) with § 38 (28), we see that, for a constant 
value of Z 3 , Xi and Z| are normally distributed about means &x 3 ^s 
and & 23^8 respectively, with correlation 

Om _ 

”V("U<^28) 

as stated. From (42) and (44) it is also clear that the deviations 
Zi ,3 and Z 3,3 are normally distributed with correlation ru, 3 . 

The above results may be extended to the case of n variables.* 

117. Significance of an observed partial correlation 
Fisher has proved that the sampling distribution of a partial 
correlation coefficientf of order k, in samples from a normal popula- 
tion, is of the same form as that of the correlation coefficient for 
samples from a bivariate normal population, with the sample 
number N reduced by k. In particular, when the partial correlation 
in the population is zero, the square of the partial correlation 
coeffident, r, in samples of N sets of values, is a y?j(i, i{N—k—2)) 
variate. By the argument used in §89 it then follows that the 
statistic t defined by 

(46) 

conforms to the t distribution for {N—k-^2) n.v. We are thus able 
to test the significance of an observed partial correlation. 

« Cf. Yule and Kendall, 1937. 1, pp. 2S2-4. t Fisher, 1924^ 4. 
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Exc^mple 1. From a random sample of 21 sets of values from a normal 
population the calculated value of a partial correlation of order three is 0-40. 
Is this consistent with the assumption that the corresponding partial corre- 
lation in the population is zero T 

In applying the above test to our assumption we have i^='21-3~2=r 16, 
and 

._0.4V(16)^ 1.6 
V(0.84) 0-916 

This value of t is not significant at the 6 % level, so that the observed correla- 
tion is not significant of correlation in the population. 

From the aampling distribution it follows, as in the case of two 
variables, that if Fisher’s z transformation of § 94 is applied to the 
above partial correlation of order A;, the statistic z is distributed 
nearly normally with variance fc — 3). We are thus able to 

test if an observed partial correlation differs significantly from some 
assumed value. Similarly, we can test the significance of the 
difference of the observed partial correlations in two independent 
samples. 

Example 2. From independent sampleR, of 32 and 23 sets of values, 
partial correlations of order four are found to be 0*4 and 0-6 respectively. 
Examine (i) whether the first value is consistent with the assumption of a 
normal population with a corresponding correlation of 0-7, and (ii) whether 
the two seunples may be regarded as from the same normal population. 

(i) From the Table 7 we find that the values = 0-4 and Tq s 0*7 corre- 
spond to = 0*424 and Zq = 0*868. The deviation of z^ is therefore 0-444. 
The s.B. of Zi is 1/V(32 — 4-- 3) = 0-2. Since the deviation of z^ is greater than 
twice the s.e. it is significant. The assumption of a correlation of 0*7 in the 
population is thus ruled out. 

(ii) Corresponding to = 0-4 and r, = 0-6 we have z^ s= 0-424 and 
Zg = 0-693, giving Z| — Z; =5 0*27 nearly. The s.e. qf the difference of the z’s 
is given by 

€ = V(A+ A) = V(9-1026) = 0-32. 

The difference z^ — z^, being less than e, is not significant. The samples may 
thus be regarded as from the same population. 

118. Significance of an observed multiple correlation 

Consider next the significance of an observed multiple correlation 
coefficient, J2, of the variable with the p variables ajg, iTs* •••» 

calculated from a random sample of N sets of values from a multi- 
variate normal population. Fisher has found the general sampling 
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distribution* of B, and has shown that it depends, not on the whole 
matrix of correlations between the variables, but simply on the 
multiple correlation in the population and the sample size, N. 
In particular, when the multiple correlation coefficient in the 
population is zero.t I?* is a \{N —p — 1)) variate. In virtue 
of Theorem vii of § 72 it follows that, in this case, i2®/(l ~ ■®*) ^ a 
— 1) ) variate ; and therefore, by the argument of § 92, 
the statistic F defined by 




Ri N-p-1 
1-B'^' p 


(46) 


conforms to the F distribution for 


Vi=p, Vt = N-p-l. (47) 

To test the hypothesis that the multiple correlation in the population 
is zero, we have only to determine from the table of F whether 
the value of this statistic calculated from the sample is significant. 
This decides the significance of the observed multiple correlation. 
We may also remark that, since is a y?i(Jp, \{N —p — 1)) variate 
in samples from a normal population in which is uncorrelated 
with any of the p variables x^, x^, ...,Xp the mean value of iZ* is 
given by 


The problem may also be approached from the point of view of 
analysis of variance and covariance. The estimate of x^, given 

by the regression equation of a;, on the p variables x,, ..., is 
of the form 

®l.(p) ~ + ••• +^l(p+U®j>+l> (46) 

and this is connected with the corresponding deviation Xi.(p) by 
the equation 

*i = »i.(p)+«i.(p)- (49) 


* Fisher, 1928, 1. See also Wilks, 1932, 1. 
t See aim Fisher, 1924, 5. 
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The N values of are oonneoted by the + 1 normal equations 
= Of (* = 2, + 1) (50) 

Squaring (49) and summing over the distribution we see that 

S»! = 2:®U)+Scf.(„). , (61) 

the sum of products vanishing since 


— S ®1.(i))(6i2*2 + 6133^3 + ...) — 0, 

in virtue of (60). The sum in the first member of (51) is equal to 
NSl, where /S| is the variance of in the sample; while the first 
sum on the right has the value i\^(l — R^) by (23'). Consequently 

= (62) 

The equations (60) may be regarded as p + 1 linear constraints on 
the N values Xi {p). Then, on the assumption that these deviations 
are normally distributed, and that R is zero in the population, the 
sum 2 * 1 .( 1 ))/®’* is distributed like x® with N—p—1 d.f., <r* being the 
variance of in the population. But, by § 77, 2^/®** is similarly 
distributed with N—1 d.f.; and therefore, by Theorem II of §81, 
2 ei.(j,)/®’* is distributed like with p d.f. Thus the two sums on 
the right of (61), when divided by N —p — 1 andp respectively, give 
independent and unbiased estimates of Inserting their values 
in terms of R* we see that the quotient of these estimates 


F- H-KiUtz} 
1-B» p 


(63) 


conforms to the F distribution for 


= N-p-l. 

We thus arrive at the same result as before.* (8ee also Ex, xu, 13). 


Example. In a sample of 25 sets of values from a normal population 
^i(is 4 ) found to be 0*4. Show that this is not significant of correlation 
in the population between and the variables a? 2 » ^ 4 * 

Here p = 3, Vj = 3, = 21. Hence 


0*84 3 3 


This is not significant, since the 6 % value of JF* is about 3*1. 


* For the distributions of various statistics occurring in this chapter see 
Bartlett, 1933, 3. 


i7*a 
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EXAMPLES XII 

1 . In a trivariate distribution it is found that 

(Tjl — 4, 0*3 = 6; 7*23 ** 0*40, ^3^^ = 0*60, t ^2 “ 0*70. 

Prove that the partial correlations are 

^23.1 " — 0*035, r3j 3 = 0*49, “ 0*63, 

and that, if the variates are measured from their means, the linear 
regression equations are 

=3 0*41^2 + 0*230:3, X2 =■ 0*96a;i— 0*025a:8, Jt, = l*04o;i — O-OSajj. 
Show also that 

^1.23 “ 1*87, 0*2.31 ^ 2*86, 0*3, * 4*00, 

•®i(23) ” 0*78, ^2(31) ~ 0*70, ^3(12) 0*60, 

2. Show that 

^1.23^2.31/^12.3 ~ 
and, more generally, 

^1.28(*)^2.8ia()/^12.8Ui) “ —(Oaria‘2l<i}i2. 
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3. From §114(38), deduce that 

which shows how a partial correlation coefficient of order n — 2 may 
be expressed in terms of multiple correlation coefficients of orders 
ft— 1 and » — 2. 

4. Prove the identity 

^IS.3^23. 1^81.2 ~ *"l2.3*’28. 1^81.2* 

5. Prove the formula 

^12 “ (^12.8 + ^13.2^32.l)/{l“^13.2^81.2)> 

expressing a regression coefficient in terms of coefficients of higher 
order. Write down the corresponding formula with subscripts 1 
and 2 interchanged, multiply together the two equations and take 
the square root, thus obtaining 

*'l2 ~ (^12.3 + ^13. 8*‘23.l)/V{(^~^13.2)(^~*'28.l)} 

expressing a correlation coefficient in terms of coefficients of higher 
order. 

6. Prove the formulae of Ex. 5 with a group (k) of secondary 
subscripts added to each of the coefficients. 

7. Verify the values of expressed by § 116 (43) and (44). 

8. Show that, if 3:3 = axi 4 6x3, the three partial correlations are 
numerically equal to unity, ri3,3 having the sign of a, r33.i the sign 
of b, and ru,3 the opposite sign to a/6. 

9. For a sample of 30 sets of values from a normal population, 

^its8> found to be O-S. Show that this is significant of correlation 
in the population between Xj andx3, X3. {F = 4*6, 2, 1^2 = 

10. Show that a partial correlation fi2,34 » 0*6, in a sample of 
20 sets of values from a normal population, is significant at the 6 % 
level. 

11. Two independent samples, of 46 and 36 sets of values, give 
corresponding partial correlations of order three as 0*41 and 0*66 
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respectively; Show that this is not inconsistent with the assumption 
that the samples are from the same normal population. Show also 
that an estimate of the partial correlation in the population, by the 
method of § 96, is 0-53. 

12. Show that, corresponding to the first sample in § 117, Ex. 2, 
the 96 % fiducial limits for the partial correlation in the population 
are 0-03 and 0*67. 

13. Deduce from the argument on p. 259 that, when the mul- 
tiple correlation in the normal population is zero, -B*/(l — 

is a p— 1)) variate and therefore, by §72, fi* is a 

MiPf 1)) variate. 

14. With the notation of §118 show that, when the multiple 
correlation in the normal population is zero, the expected value of 
R in the sample is r( i(p -i- 1 )) r(^{N - 1 ))/r{ip) rUN). 
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